SlideShare a Scribd company logo
Robust rule-based parsing
(quick overview)

I.
II.
III.
IV.

Robustness
Three robust rule-based
parsers of English
Common features
Example : identification of
subjects in Syntex
I. Robustness
(Aït-Mohktar et al. 1997)



« the ability to provide useful analyses for realword input text. By useful analyses, we mean
analyses that are (at least partially) correct and
usable in some automatic task or application »

 implies :




1 analysis (even partial) for any real world input
ability to process irregular input, to overcome error
analysis
efficiency
I. Types of robust parsers
(Aït Mokhtar et al. 1997)


based on traditional theorical models with rule-based and/or
stochastic post-processing
 Minipar (Lin 1995)



stochastic parsers
 Charniak’s parser (2000)



rule-based parsers


Non-Projective Dependency Parser (Järvinen & Tapanainen
1997)



Syntex (Bourigault 2007)



Cass (Abney 1990,1995)

 most parsers are hybrid
II.1 Non-Projective Dependency Parser
(Tapanainen & Järvinen 1997)
Tagged Text
Syntactic
Labeling

valency
subcategorization
information

Selection of
syntactic
links

Pruning

OUTPUT

« all legitimate surface-syntactic
labels are added to the set of
morphological readings »
« syntactic rules discard
contextually illegitimate
alternatives or select legitimate
ones »
General heuristics
disambiguate the last of the
syntactic links
II.1 Non-Projective Dependency Parser
(Tapanainen & Järvinen 1997)



Rules establish dependency links between words



Rules are contextual :
SELECT (@SUBJ)
IF (1C AUXMOD HEAD);

SUBJ

How do you do ?
AUX



If the preceding the word is an unambiguous auxiliary,
the current word is the subject of this auxiliary



Rules use syntactic links established by preceding rules
II.2 Syntex

(Bourigault 2007)

Tagged Text

Endogenous and
exogenous
subcategorization
information

Verb Chunk

he will leave

non recursive NP

the man

non recursive SP
Object, Subject

Endogenous and
exogenous
subcategorization
information

Prep
Attachement
OUTPUT

?
?
happy tree friends

from Paris

This is the man
?

?
This is the man from Paris
II.2 Syntex (Bourigault 2007)


One module per syntactic relation
Each module processes the sentence from left to right.



Like the Non-Projective Dependency Parser, the rules









establish dependency relations between words
are contextual
use syntactic links established by preceding rules

The identification of a dependency link is formulated as a «path» to be
followed up through the existing links and grammatical categories



from governor to dependent or from dependent to governor
Ambiguous relations : selection of potential governors +
desambiguisation with probabilities

Those who think they are interested in water supply must vote
II.3 Cass (Abney 1990,1995)
Tagged Text
CHUNK FILTER
NP filter
Chunk filter
CLAUSE FILTER
Raw Clause filter

Non recursive chunks
Internal structure remains ambiguous
[NP the happy tree friends]
[VP will leave]
[SP from [NP the happy tree friends]
Subject-predicate relation
Beginning and end of simplex clauses
[SUBJThis] [PREDis] [NPthe man][SPfrom Paris]

Clause Repair filter
subcategorization
information

PARSE FILTER
OUTPUT

Repair if no Subject-predicate relation
Assembles recursive structures
[[This] [is] [NPthe man][SPfrom Paris] ]
II.3 Cass (Abney 1990,1995)


Each filter uses transducers :
PP  (Prep|To)+(NP|Vbg)

Use of repair





(also used in Syntex and NPDP but less explicit):

Each filter makes a decision (determinism), the safest one in
case of ambiguity

« ambiguity is not propagated downstream »




« repair consists in directly modifying erroneous structure
without regard to the history of computation that produced
the structure »
« when errors become apparent downstream, the parser
attempts to repair them »
II.3 Cass (Abney 1990,1995)


Example of repair

In South Australia beds of boulders were deposited …


Erroneous structure output from the Chunk filter

[SPIn [NPSouth Australia beds]][SPof [NPboulders]][VPwere
deposited]



Raw Clause filter : no subject is found
Repair filter tries to find a subject by modifying the structure

[SPIn [NPSouth Australia beds]][SPof [SPof boulders][VPwere
Australia]][NP-SUBJbeds][ NPboulders]][VPwere
deposited]
III. Common features : Incrementality


The parsing task is divided into substasks

 reduces the overall complexity of the main task :
« factoring the problem into a sequence of small, well defined
questions » (Abney 1990).


The sentence is parsed in several phases, each phase producing
an intermediate structure
 allows each phase to use the syntactic information left by the
predecing phase
« the level of abstraction produced during the 1st phase (...)
facilitates the description of deeper syntactic relations» (Aït-Mohktar et al.
1997)

 ease of maintenance


problem of circularity : difficult to choose in what order the relation
should be identified (Bourigault 2007)
III. Common features :
determinism and repair


Each parsing phase yields one solution.






In case of ambiguity, the safest choice is made, even if
some higher level information is needed
ambiguity is not propagated downstream

Most regular errors can be repaired later on


≠ parallelism, backtracking

« The salient performance is not errors vs no errors,
but the tradeoff between speed and error rate » (Abney
1990)
III. Common features: no syntactic
theory


Difference between :





Difficulties in automatic syntactic analysis :







the theoretical study of the syntactic structures of language
automatic identification of grammatical relation in real-word
texts

lack of knowledge (semantics/pragmatics for desambiguation)
deviation from the norm of the language
errors of preceding processing steps

Use of common grammatical knowledge
Hours of corpus observation to find clues for automatic
identification
III. Common features : implicit
grammatical knowledge
Bipartite architecture :






Lexical information
Recognition routines

No independent declaration of grammatical knowlege


Difficult / impossible to set apart :



Grammatical knowledge
Non grammar-based heuristics

 No linguist/computer scientist job separation
 Need both linguistic and programming know-hows


A condition to scalability and robustness
IV. Example : the subject relation
in Syntex


The identification of the subject relation is formulated as a
«path» through the already identified grammatical relations :




start from tensed verb
move to the left
stop when you encounter an ungoverned Noun
SUJ

DET

PREP

NOMPREP

the cost of
Det

Noun

SUBJECT

Prep

OBJ

NOMPREP

technology takes time to
Noun

Verb

Noun

TENSED VERB

Prep

shrink
Noun
IV. Using existing links



The Subject might be far from the tensed verb
Lots of configuration are possible :

Initiatives leading to cessation of smoking in workplaces
are adopted

Gerund

PP

PP

Those who think they are interested in water supply must
vote.

Clause

Clause

PP

No reference to the war, or to the alliance, should remain
PP



Conj

PP

Existing links form dependency islands (~syntagms or isolated words)
Following up the islands until a reasonnable subject is found allows to find subjects without
describing all possible configurations or doing too much computing
IV. Ambiguities
Many persons have died in Darfur since the conflict began
A person sitting on the death row since the age of 16 is
not the same as before.

Many adults believe education equates intelligence.
Those who think they are interested in water supply must
vote.


When to stop? When to follow up ? When to repair ?
IV. Path decomposition


At each island, a decision is made by a dedicated submodule (one type of island = one sub-module) :
follow up to the island on the left
 stop and identify a subject



 without

repair
 with repair


change path direction
 to

the right
 to any other position in the sentence





call other module
stop and return failure

Decisions are encoded as if-then rules that may test :


local and non-local context : lemmas, ms tags, links, presence of commas…



specific information left by other modules : encountered tags, activated modules …
IV. Path Example : following up
SUBJ
Korea who we believe to have WMD is safe from us.
Clause

PP
PP module
Clause module

_ RelPron [[SUJPron] Verb ]
IV. Path example : repair
OBJ
SUBJ

Many adults believe education equates intelligence.
Clause

Clause module

## [[SUBJNP] Verb [[OBJ [SUBJNP] Verb ]
Verb
OBJNP]]
IV. Path example : sub-module call
SUBJ



On the walls were scarlett banners
PP

PP module
Wall module

## [PP] Verb

NP
InvertedSubject
module

_
IV. Path example : change path




On the contrary, war hysteria was continuous and
PP module

Clause module

Conj

deliberate, and acts such as looting, murdering, the
Adj

slaughters of

Noun

PP module

prisonners,

were considered as normal.

Commas module

+2.6 Recall
-0.07 Precision

All three political Parties at the federal level, and certainly at the
provincial level in different sections, have parity clauses.
Although no directive was ever issued, it was known that the chief of
the Departement intended that within one week no reference to the war
with Eurasia, or to the alliance, should remain
IV. Evaluation on Susanne Corpus
Tensed verb
Identification

Subject
Identification

(TreeTagger)

(if tensed verb correct)

(correct tensed verb and
correct subject)

precision

94,87

94,56

89,51

recall

89,76

90,84

81,53

f-mesure

92,24

92,66

85,33




Shallow subjects evaluation only
are not identified or evaluated :




I’ve never seen the dog hiding his bones.
She wants me to clean my shoes
The book is read by the boy

SUBJECT
RELATION
Bibliography



Abney (1990) : « Rapid Incremental Parsing with Repair », Proceedings of the 6th
New OED Conference, University of Waterloo, Waterloo, Ontario.
Abney (1995) : «Partial Parsing with finite state cascade », Natural Language
Engineering, Cambridge University Press





Aït-Mokhtar et al. (1997) : « Incremental Finite State Parsing », Proceedings of the
ANLP-97, Washington
Bourigault (2007) : Syntex, analyseur syntaxique opérationnel, Thèse d’Habilitation
à Diriger les Recherches, Université Toulouse - Le Mirail.






http://www.cs.ualberta.ca/~lindek/downloads.htm

Tapanainen & Järvinen (1997) : « A Dependency Parser for English», Technical
Reports, No.TR-1, Department of General Linguistics University, March 1997.




http://www.cfilt.iitb.ac.in/~anupama/charniak.php

Lin (1995) :« Dependency-based Evaluation of Minipar », Proceedings of JCAI.




w3.univ-tlse2.fr/erss/textes/pagespersos/bourigault/syntex.html

Charniak (2000): «A maximum-entropy-inspired parser », In The Proceedings
of the North American, Chapter of the Association for Computational
Linguistics,pp 132–139.




www.sfs.uni-tuebingen.de/~abney/StevenAbney.html#cass

www.connexor.com

TreeTagger : http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Evaluation Corpus : ftp://ftp.cs.umanitoba.ca/pub/lindek/depeval

More Related Content

What's hot

[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
Hiroki Shimanaka
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
Lidia Pivovarova
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
Andre Freitas
 
FinalReport
FinalReportFinalReport
FinalReport
Vinh Xuan Ho
 
AINL 2016: Malykh
AINL 2016: MalykhAINL 2016: Malykh
AINL 2016: Malykh
Lidia Pivovarova
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
Lidia Pivovarova
 
Context, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyContext, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge Ontology
Mike Bergman
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
Lidia Pivovarova
 
Introduction to automata
Introduction to automataIntroduction to automata
Introduction to automata
Shubham Bansal
 
Python Tutorial Questions part-1
Python Tutorial Questions part-1Python Tutorial Questions part-1
Python Tutorial Questions part-1
Srinimf-Slides
 
Er
ErEr
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
Ilia Karpov
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
Nikhil Jaiswal
 
Ceis 3
Ceis 3Ceis 3
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Surya Sg
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
Ali Kabbadj
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
Taha Shakeel
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
Steven Miller
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
RIILP
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Estelle Delpech
 

What's hot (20)

[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
FinalReport
FinalReportFinalReport
FinalReport
 
AINL 2016: Malykh
AINL 2016: MalykhAINL 2016: Malykh
AINL 2016: Malykh
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Context, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyContext, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge Ontology
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Introduction to automata
Introduction to automataIntroduction to automata
Introduction to automata
 
Python Tutorial Questions part-1
Python Tutorial Questions part-1Python Tutorial Questions part-1
Python Tutorial Questions part-1
 
Er
ErEr
Er
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
[Emnlp] what is glo ve part i - towards data science
[Emnlp] what is glo ve  part i - towards data science[Emnlp] what is glo ve  part i - towards data science
[Emnlp] what is glo ve part i - towards data science
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Computational model language and grammar bnf
Computational model language and grammar bnfComputational model language and grammar bnf
Computational model language and grammar bnf
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 

Viewers also liked

Вертикальная интеграция вычислительных архитектур - Проблема медиатора
Вертикальная интеграция вычислительных архитектур - Проблема медиатораВертикальная интеграция вычислительных архитектур - Проблема медиатора
Вертикальная интеграция вычислительных архитектур - Проблема медиатора
Yehor Churilov
 
Marca Personal
Marca PersonalMarca Personal
Marca Personal
Cams Lopez
 
BenKuchlerportfolioFINAL_lowres
BenKuchlerportfolioFINAL_lowresBenKuchlerportfolioFINAL_lowres
BenKuchlerportfolioFINAL_lowres
Kuchler, Ben
 
Tratamientos
TratamientosTratamientos
Tratamientos
Brenda Caraguay
 
JRuby in Java Projects
JRuby in Java ProjectsJRuby in Java Projects
JRuby in Java Projects
jazzman1980
 
An Introduction to Networks
An Introduction to NetworksAn Introduction to Networks
An Introduction to Networks
Francesco Gadaleta
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011
Dmitry Kan
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...
Dmitry Kan
 
Finnish business angel activity 2015
Finnish business angel activity 2015Finnish business angel activity 2015
Finnish business angel activity 2015
FiBAN
 
groovy rules
groovy rulesgroovy rules
groovy rules
Paul King
 

Viewers also liked (10)

Вертикальная интеграция вычислительных архитектур - Проблема медиатора
Вертикальная интеграция вычислительных архитектур - Проблема медиатораВертикальная интеграция вычислительных архитектур - Проблема медиатора
Вертикальная интеграция вычислительных архитектур - Проблема медиатора
 
Marca Personal
Marca PersonalMarca Personal
Marca Personal
 
BenKuchlerportfolioFINAL_lowres
BenKuchlerportfolioFINAL_lowresBenKuchlerportfolioFINAL_lowres
BenKuchlerportfolioFINAL_lowres
 
Tratamientos
TratamientosTratamientos
Tratamientos
 
JRuby in Java Projects
JRuby in Java ProjectsJRuby in Java Projects
JRuby in Java Projects
 
An Introduction to Networks
An Introduction to NetworksAn Introduction to Networks
An Introduction to Networks
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...
 
Finnish business angel activity 2015
Finnish business angel activity 2015Finnish business angel activity 2015
Finnish business angel activity 2015
 
groovy rules
groovy rulesgroovy rules
groovy rules
 

Similar to Robust rule-based parsing

Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
butest
 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
Sagar Dabhi
 
Bondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary DetectorBondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary Detector
butest
 
Monoton-working version-1995.doc
Monoton-working version-1995.docMonoton-working version-1995.doc
Monoton-working version-1995.doc
butest
 
Monoton-working version-1995.doc
Monoton-working version-1995.docMonoton-working version-1995.doc
Monoton-working version-1995.doc
butest
 
Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
Ahmad Mashhood
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
How to write a paper
How to write a paperHow to write a paper
How to write a paper
Irika Widiasanti
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculus
unyil96
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
Conor McGrory
 
Chapter5
Chapter5Chapter5
Chapter5
orengomoises
 
Chapter5
Chapter5Chapter5
Chapter5
orengomoises
 
Chapter5
Chapter5Chapter5
Chapter5
orengomoises
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Ajay Ohri
 
P. e. bland, rings and their modules
P. e. bland, rings and their modulesP. e. bland, rings and their modules
P. e. bland, rings and their modules
khudair al fauudi
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
Mario Flecha
 
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsAutomatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
Jeff Nelson
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
dannyijwest
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
Marina Santini
 
Ma
MaMa
Ma
anesah
 

Similar to Robust rule-based parsing (20)

Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
 
Bondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary DetectorBondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary Detector
 
Monoton-working version-1995.doc
Monoton-working version-1995.docMonoton-working version-1995.doc
Monoton-working version-1995.doc
 
Monoton-working version-1995.doc
Monoton-working version-1995.docMonoton-working version-1995.doc
Monoton-working version-1995.doc
 
Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
How to write a paper
How to write a paperHow to write a paper
How to write a paper
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculus
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter5
Chapter5Chapter5
Chapter5
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
P. e. bland, rings and their modules
P. e. bland, rings and their modulesP. e. bland, rings and their modules
P. e. bland, rings and their modules
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner TextsAutomatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Ma
MaMa
Ma
 

More from Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
Estelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
Estelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
Estelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
Estelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
Estelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
Estelle Delpech
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Estelle Delpech
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Estelle Delpech
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
Estelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
Estelle Delpech
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
Estelle Delpech
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
Estelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
Estelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
Estelle Delpech
 

More from Estelle Delpech (18)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Recently uploaded

Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 

Recently uploaded (20)

Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 

Robust rule-based parsing

  • 1. Robust rule-based parsing (quick overview) I. II. III. IV. Robustness Three robust rule-based parsers of English Common features Example : identification of subjects in Syntex
  • 2. I. Robustness (Aït-Mohktar et al. 1997)  « the ability to provide useful analyses for realword input text. By useful analyses, we mean analyses that are (at least partially) correct and usable in some automatic task or application »  implies :    1 analysis (even partial) for any real world input ability to process irregular input, to overcome error analysis efficiency
  • 3. I. Types of robust parsers (Aït Mokhtar et al. 1997)  based on traditional theorical models with rule-based and/or stochastic post-processing  Minipar (Lin 1995)  stochastic parsers  Charniak’s parser (2000)  rule-based parsers  Non-Projective Dependency Parser (Järvinen & Tapanainen 1997)  Syntex (Bourigault 2007)  Cass (Abney 1990,1995)  most parsers are hybrid
  • 4. II.1 Non-Projective Dependency Parser (Tapanainen & Järvinen 1997) Tagged Text Syntactic Labeling valency subcategorization information Selection of syntactic links Pruning OUTPUT « all legitimate surface-syntactic labels are added to the set of morphological readings » « syntactic rules discard contextually illegitimate alternatives or select legitimate ones » General heuristics disambiguate the last of the syntactic links
  • 5. II.1 Non-Projective Dependency Parser (Tapanainen & Järvinen 1997)  Rules establish dependency links between words  Rules are contextual : SELECT (@SUBJ) IF (1C AUXMOD HEAD); SUBJ How do you do ? AUX  If the preceding the word is an unambiguous auxiliary, the current word is the subject of this auxiliary  Rules use syntactic links established by preceding rules
  • 6. II.2 Syntex (Bourigault 2007) Tagged Text Endogenous and exogenous subcategorization information Verb Chunk he will leave non recursive NP the man non recursive SP Object, Subject Endogenous and exogenous subcategorization information Prep Attachement OUTPUT ? ? happy tree friends from Paris This is the man ? ? This is the man from Paris
  • 7. II.2 Syntex (Bourigault 2007)  One module per syntactic relation Each module processes the sentence from left to right.  Like the Non-Projective Dependency Parser, the rules      establish dependency relations between words are contextual use syntactic links established by preceding rules The identification of a dependency link is formulated as a «path» to be followed up through the existing links and grammatical categories   from governor to dependent or from dependent to governor Ambiguous relations : selection of potential governors + desambiguisation with probabilities Those who think they are interested in water supply must vote
  • 8. II.3 Cass (Abney 1990,1995) Tagged Text CHUNK FILTER NP filter Chunk filter CLAUSE FILTER Raw Clause filter Non recursive chunks Internal structure remains ambiguous [NP the happy tree friends] [VP will leave] [SP from [NP the happy tree friends] Subject-predicate relation Beginning and end of simplex clauses [SUBJThis] [PREDis] [NPthe man][SPfrom Paris] Clause Repair filter subcategorization information PARSE FILTER OUTPUT Repair if no Subject-predicate relation Assembles recursive structures [[This] [is] [NPthe man][SPfrom Paris] ]
  • 9. II.3 Cass (Abney 1990,1995)  Each filter uses transducers : PP  (Prep|To)+(NP|Vbg) Use of repair   (also used in Syntex and NPDP but less explicit): Each filter makes a decision (determinism), the safest one in case of ambiguity « ambiguity is not propagated downstream »   « repair consists in directly modifying erroneous structure without regard to the history of computation that produced the structure » « when errors become apparent downstream, the parser attempts to repair them »
  • 10. II.3 Cass (Abney 1990,1995)  Example of repair In South Australia beds of boulders were deposited …  Erroneous structure output from the Chunk filter [SPIn [NPSouth Australia beds]][SPof [NPboulders]][VPwere deposited]   Raw Clause filter : no subject is found Repair filter tries to find a subject by modifying the structure [SPIn [NPSouth Australia beds]][SPof [SPof boulders][VPwere Australia]][NP-SUBJbeds][ NPboulders]][VPwere deposited]
  • 11. III. Common features : Incrementality  The parsing task is divided into substasks  reduces the overall complexity of the main task : « factoring the problem into a sequence of small, well defined questions » (Abney 1990).  The sentence is parsed in several phases, each phase producing an intermediate structure  allows each phase to use the syntactic information left by the predecing phase « the level of abstraction produced during the 1st phase (...) facilitates the description of deeper syntactic relations» (Aït-Mohktar et al. 1997)  ease of maintenance  problem of circularity : difficult to choose in what order the relation should be identified (Bourigault 2007)
  • 12. III. Common features : determinism and repair  Each parsing phase yields one solution.    In case of ambiguity, the safest choice is made, even if some higher level information is needed ambiguity is not propagated downstream Most regular errors can be repaired later on  ≠ parallelism, backtracking « The salient performance is not errors vs no errors, but the tradeoff between speed and error rate » (Abney 1990)
  • 13. III. Common features: no syntactic theory  Difference between :    Difficulties in automatic syntactic analysis :      the theoretical study of the syntactic structures of language automatic identification of grammatical relation in real-word texts lack of knowledge (semantics/pragmatics for desambiguation) deviation from the norm of the language errors of preceding processing steps Use of common grammatical knowledge Hours of corpus observation to find clues for automatic identification
  • 14. III. Common features : implicit grammatical knowledge Bipartite architecture :    Lexical information Recognition routines No independent declaration of grammatical knowlege  Difficult / impossible to set apart :   Grammatical knowledge Non grammar-based heuristics  No linguist/computer scientist job separation  Need both linguistic and programming know-hows  A condition to scalability and robustness
  • 15. IV. Example : the subject relation in Syntex  The identification of the subject relation is formulated as a «path» through the already identified grammatical relations :    start from tensed verb move to the left stop when you encounter an ungoverned Noun SUJ DET PREP NOMPREP the cost of Det Noun SUBJECT Prep OBJ NOMPREP technology takes time to Noun Verb Noun TENSED VERB Prep shrink Noun
  • 16. IV. Using existing links   The Subject might be far from the tensed verb Lots of configuration are possible : Initiatives leading to cessation of smoking in workplaces are adopted Gerund PP PP Those who think they are interested in water supply must vote. Clause Clause PP No reference to the war, or to the alliance, should remain PP   Conj PP Existing links form dependency islands (~syntagms or isolated words) Following up the islands until a reasonnable subject is found allows to find subjects without describing all possible configurations or doing too much computing
  • 17. IV. Ambiguities Many persons have died in Darfur since the conflict began A person sitting on the death row since the age of 16 is not the same as before. Many adults believe education equates intelligence. Those who think they are interested in water supply must vote.  When to stop? When to follow up ? When to repair ?
  • 18. IV. Path decomposition  At each island, a decision is made by a dedicated submodule (one type of island = one sub-module) : follow up to the island on the left  stop and identify a subject   without repair  with repair  change path direction  to the right  to any other position in the sentence    call other module stop and return failure Decisions are encoded as if-then rules that may test :  local and non-local context : lemmas, ms tags, links, presence of commas…  specific information left by other modules : encountered tags, activated modules …
  • 19. IV. Path Example : following up SUBJ Korea who we believe to have WMD is safe from us. Clause PP PP module Clause module _ RelPron [[SUJPron] Verb ]
  • 20. IV. Path example : repair OBJ SUBJ Many adults believe education equates intelligence. Clause Clause module ## [[SUBJNP] Verb [[OBJ [SUBJNP] Verb ] Verb OBJNP]]
  • 21. IV. Path example : sub-module call SUBJ  On the walls were scarlett banners PP PP module Wall module ## [PP] Verb NP InvertedSubject module _
  • 22. IV. Path example : change path   On the contrary, war hysteria was continuous and PP module Clause module Conj deliberate, and acts such as looting, murdering, the Adj slaughters of Noun PP module prisonners, were considered as normal. Commas module +2.6 Recall -0.07 Precision All three political Parties at the federal level, and certainly at the provincial level in different sections, have parity clauses. Although no directive was ever issued, it was known that the chief of the Departement intended that within one week no reference to the war with Eurasia, or to the alliance, should remain
  • 23. IV. Evaluation on Susanne Corpus Tensed verb Identification Subject Identification (TreeTagger) (if tensed verb correct) (correct tensed verb and correct subject) precision 94,87 94,56 89,51 recall 89,76 90,84 81,53 f-mesure 92,24 92,66 85,33   Shallow subjects evaluation only are not identified or evaluated :    I’ve never seen the dog hiding his bones. She wants me to clean my shoes The book is read by the boy SUBJECT RELATION
  • 24. Bibliography   Abney (1990) : « Rapid Incremental Parsing with Repair », Proceedings of the 6th New OED Conference, University of Waterloo, Waterloo, Ontario. Abney (1995) : «Partial Parsing with finite state cascade », Natural Language Engineering, Cambridge University Press    Aït-Mokhtar et al. (1997) : « Incremental Finite State Parsing », Proceedings of the ANLP-97, Washington Bourigault (2007) : Syntex, analyseur syntaxique opérationnel, Thèse d’Habilitation à Diriger les Recherches, Université Toulouse - Le Mirail.    http://www.cs.ualberta.ca/~lindek/downloads.htm Tapanainen & Järvinen (1997) : « A Dependency Parser for English», Technical Reports, No.TR-1, Department of General Linguistics University, March 1997.   http://www.cfilt.iitb.ac.in/~anupama/charniak.php Lin (1995) :« Dependency-based Evaluation of Minipar », Proceedings of JCAI.   w3.univ-tlse2.fr/erss/textes/pagespersos/bourigault/syntex.html Charniak (2000): «A maximum-entropy-inspired parser », In The Proceedings of the North American, Chapter of the Association for Computational Linguistics,pp 132–139.   www.sfs.uni-tuebingen.de/~abney/StevenAbney.html#cass www.connexor.com TreeTagger : http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ Evaluation Corpus : ftp://ftp.cs.umanitoba.ca/pub/lindek/depeval