Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese

technology 
from seed"

Paraphrasing
Human
Intransitive
Adjective

Constructions
in
Port4NooJ

CRISTINA
MOTA1

PAULA
CARVALHO1,2

FRANCISCO
RAPOSO1

ANABELA
BARREIRO1

1
INESC-‐ID,
Lisbon

2
Universidade
Europeia
|
Laureate
International
Universities

International NooJ 2015 Conference Ÿ Minsk, 13 June

eSPERTo
–
System
for
Paraphrasing
in
Editing
and
Revision
of
Texts

•  Main
objective

–  Design
and
development
of
a
linguistically
enhanced
paraphrase
generator

•  Semantico-‐syntactic
and
multiword
units

•  Sensitive
to
context

•  Method

–  Hybrid
system,
combining
statistics
and
linguistic
knowledge
to
identify
and
generate
new
and

more
complex
paraphrases

–  Exploitation
of
existing
paraphrasing
resources

•  Web
platform

–  Interactive
application
to
help
Portuguese
language
learners
in
producing
and
revising
their

texts

–  Text-‐editing
mechanisms
which
provide
a
variety
of
alternatives
for
each
expression

–  Users
can
choose
or
suggest
expressions
that
can
be
immediately
applied
to
their
text

–  Support
to
writing
optimization,
understandability
and
translatability

Introduction
to
the
eSPERTo
Project

2

Linguistic Resources

•  Linguistic
knowledge
databases

Port4NooJ

Eng4Nooj

•  Originally
(English-‐Portuguese)
OpenLogos
resources
(http://logos-os.dfki.de/)

•  Converted
into
NooJ
format

•  Enhanced
with
new
properties,
including
derivational
and
morpho-‐syntactic
and

semantic
relations

Earlier
versions

•  Phrasal
verbs
into
equivalent
expressions

–  to
clear
up
(weather)
=
(weather)
to
become
better/brighter

•  Support
verb
constructions
into
single
verbs

–  to
make
a
decision
=
to
decide

–  to
make
a
presentation
of
=
present

–  to
give
support
to
N(AN)
=
to
support
N(AN)

–  to
get
into
contact
with
=
to
contact

–  to
become
acid
=
to
acidify

•  Support
verb
constructions
into
their
stylistic
variants

–  to
make
an
audit
=
to
perform
an
audit

–  to
make
an
impression
=
to
create
an
impression

•  Aspectual
constructions
into
single
verbs

–  to
launch
an
attack
=
to
attack

Earlier
versions

•  Adverbs
(compounds
into
single
adverbs)

–  in
a
constructive
way
=
constructively

–  on
purpose

purposely
=
deliberately

•  Relatives
into
participial
adjectives

–  the
president
that
was
elected
=
the
president
elect

•  Relatives
into
possessives

–  the
role
that
Europe
plays/has
=
the
role
of
Europe

–  the
position
that
the
Church
defends
=
the
position
of
the
Church

•  Relatives

into
compound
nouns
(and
vice-‐versa)

–  a
container
for
the
milk
=
a
milk
container

–  a
bottle
made
of
plastic
=
a
plastic
bottle

•  Agentive
passives
into
actives

–  the
young
man
is
released
by
the
police
ofZicer

=
the
police
ofZicer
releases
the
young
man

eSPERTo
Architecture

6

eSPERTo online
Combine Text
and
suggestions
Input Text
+
Resource
selection
noojapply + STRING
Parahrase
suggestions Port4NooJ
Dictionaries Grammars
Eng4Nooj
...
Ital4NooJ
Fren4NooJ
Ger4NooJ
Spa4NooJ
Linguist
Validation
Hybrid
Paraphrase
Acquisition
User
feedbackDictionaries Grammars

eSPERTo:
noojapply
Integration

7

https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl

eSPERTo:
noojapply
Integration

8

eSPERTo Web Interface

User conﬁguration

eSPERTo:
noojapply
Integration

9

noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt

User conﬁguration

eSPERTo:
noojapply
Integration

10


User conﬁguration

eSPERTo:
noojapply
Integration

11


User conﬁguration


Result presentation

teste.txt:0,17,O homem que é americano
teste.txt:0,17,O homem da América
teste.txt:0,17,O homem de nacionalidade americana
teste.txt:0,17,O homem de naturalidade americana
teste.txt:0,17,O homem de origem americana
teste.txt:0,39,o trabalho foi apresentado pelo homem americano
teste.txt:18,10,efectuar apresentação
teste.txt:18,10,fazer apresentação
teste.txt:18,10,realizar apresentação

eSPERTo:
noojapply
integration

12

the man who is American

the man from America

the man with American nationality

…

The American man

https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl

LG
of
Portuguese
Human
Intransitive
Adjectives

13

•  eSPERTo
was
enhanced
with
new
paraphrases,
derived
from
15
Lexicon-‐Grammar
(LG)
tables

describing
the
distributional
properties
of
4,250
human
intransitive
adjectives
(HIA)
(Carvalho,
2008):

–  Syntactic
and
semantic
nature
of
the
subject
modiZied
by
each
adjective;

–  Copulative
verbs
selected
by
each
adjective;

–  Constraints
on
the
quantiZication
of
adjectives
by
an
adverb
or
a
degree
morpheme;

–  Position
of
adjectives
in
adnominal
context;

–  Optional
adjective
“complements”;

–  Generic
NP
and
cross-‐constructions,
where
the
adjective
Zills
the
head
of
a
noun
phrase;

–  Characterizing
indeZinite
constructions,
where
the
adjective
occurs
after
an
indeZinite

article;

–  Exclamative
sentences
expressing
insult.

Adjective
Selection

CETEMPublico Adj

17.300 lemmas

Predicate Adj

13.875 lemmas

Adj Intrans Hum

4.250 lemmas

Adj Doen

187

Adj Filo

303

Adj Nac

651

Adj Hum

3.109

Lookup with LabEL lexical resources (LABEL-LEX)

(Ranchhod et al. 2004)

Pre-selection and classiﬁcation of Adj according to
the linguistic criteria deﬁned in Carvalho (2001)

Nhum Vcop Adj

Hum
Adj
SubclassiKication
Criteria

ADJ HUM

SER

SER + ESTAR

N0 ser Adj

N0 ser um Adj

N0=: Nhum

Nap de Nhum

QueF

N0=: Nhum

N0=: Nhum

Nap de Nhum

N0=: Nhum

Nap de Nhum

QueF

N0=: Nhum

N0=: Nhum

Nap de Nhum

ESTAR

N0 (ser+estar)

Adj

N0 ser um Adj

N0 estar Adj

N0=: Nhum

Nap de Nhum

N0=: Nhum

N0=: Nhum

Nap de Nhum

N0=: Nhum

N0=: Nhum

Nap de Nhum

N0=: Nhum

Hum
Adj
SubclassiKication
Criteria

ADJ HUM

SER

SER + ESTAR

N0 ser Adj

N0 ser um Adj

ESTAR

N0 (ser+estar)

Adj

N0 ser um Adj

N0 estar Adj

SAHP1
inteligente
SAHP2
atlético
SAHP3
culto
SAHC1
idiota
SAHC2
sedutor
SAHC3
inculto
EAHP2
abatido
EAHP3
zangado
SEAHP2
bonito
SEAHP3
velho
SEAHC2
gordo
SEAHC3
bêbado

LG
Tables
(EAHP3)

17

•  Adjective,
noun
and
verb
morphologically
related
constructions

–  está
zangado
(is
angry)
=
zangou-‐se
(got
(self)
angry)
=
esteve
envolvido
numa
zanga
(was
involved
in
anger)

•  Adjective
constructions
supported
by
different
copulative
verbs

–  estar
perdido
(to
be
lost)
=
andar
perdido
(walk
around
lost)

•  Constructions
involving
nationality
and
other
membership
relations

–  (de
origem
portuguesa
(of
Portuguese
origin/roots)
=
portugueses
(Portuguese)
=
de
Portugal
(from
Portugal)

–  beniquista
(Benica
fan)
=
do
Sport
Lisboa
e
Benica
(a
fan
of
Sport
Lisboa
e
Benica)

•  Cross-‐constructions

–  o
idiota
do
rapaz
(the
idiot
of
the
boy)
=
o
rapaz
é
um
idiota
(the
boy
is
an
idiot)

•  Appropriate
noun
constructions

–  foi
moderado
nos
seus
comentários
(he
was
moderated
in
his
comments)
=
os
seus
comentários
foram
moderados

(his
comments
were
moderated)
=
foi
moderado
(he
was
moderated)

•  Generic
noun
phrases

–  é
um
indivíduo
estúpido
(he
is
a
fool)
=
é
um
estúpido
(he
is
a
fool)
=
é
estúpido
(he
is
a
fool)

New
Transformations

18

–  From
LG
tables
to
NooJ
dictionaries

•  Mostly
done
automatically
with
different
scripts

Integration
of
LG
of
Portuguese
Human

Intransitive
Adjectives

19

Port4NooJ

LG tables

Adjectivos_IH

ü  If adjective in Port4NooJ merge the LG
properties with dictionary entry else
create new entry

ü  Create FLX and DRV codes and
corresponding rules as needed

ü  Check for missing FLX and DRV codes

–  From
LG
tables
to
NooJ
dictionaries

•  Representation
of
LG
table
properties

Integration
of
LG
of
PT
HIA

20

+Top=Abissínia
+TopDET=a

+NclassPnacionalidade
+NAdj
+Vcopser
+IH
+Table=SAN

–  From
LG
tables
to
NooJ
dictionaries

of
LG
table
properties

Integration
of
LG
of
PT
HIA

21

+Top=Abissínia
+TopDET=a

+NclassPnacionalidade
+NAdj
+Vcopser
+IH
+Table=SAN
Determined automatically by
consulting AC/DC corpora

o homem abissínio ó o homem da Abissínia

o homem açoriano ó o homem dos Açores

o homem português ó o homem de Portugal

–  From
LG
tables
to
NooJ
dictionaries

of
LG
table
properties

Integration
of
LG
of
PT
HIA

22

+IH
+Table=SEAHP3
+Nome=alegria

+Verbo=alegrar-se

+Nnhum

–  From
LG
tables
to
NooJ
dictionaries

of
LG
table
properties

Integration
of
LG
of
PT
HIA

23

+IH
+Table=SEAHP3
+DRV=A2N143:CASA

+DRV=A2V6:FALAR
+Reflexivo
+Nnhum

–  From
LG
tables
to
NooJ
dictionaries

of
LG
table
properties

Integration
of
LG
of
PT
HIA

24

+IH
+Table=SEAHP3
+DRV=A2N143:CASA

+DRV=A2V6:FALAR
+Reflexivo•  DRV code is determined and formalized automatically by ﬁnding
the radical between the adjective and the noun or verb

alegr(ia) = A2N143 = B1ia/N

alegr(ar) = A2V6 = B1ar/V

•  FLX code is determined by consulting Port4NooJ

alegria,N+FLX=CASA+AB+state+EN=joy+SYNN=contentamento

alegrar,V+FLX=FALAR+Aux=1+PRECVagree-type+Subset=…

If the derived form does not exist, then its code is assigned
automatically

+Nnhum

velho,A+FLX=ALTO+AB+class+EN=vintage
velho,A+FLX=ALTO+AN+Hum+EN=elder
velho,A+FLX=ALTO+NAV+Apred+EN=old
+IH
+Table=SEAHP3
+Nhum+Vcopser
+Vcopestar
+Vcopencontrarse
+Vcopsentirse
+Vcoptornarse
+UMNclas
+UmModif
+AdvQuant
+Superlativo
+Nadj
+DRV=A2N164:CASA
+DRV=A2V67:AGRADECER
–  From
LG
tables
to
NooJ
dictionaries

•  Integration
with
eSPERTo
dictionary
entries

①  Adjective
exists
in
Port4NooJ

ü  Port4NooJ
entries
blindly
receive
the
additional
properties
as
speciZied
by
the
LG
tables

q  In
a
second
round,
discard
at
least
entries
marked
with
+AB

Integration
of
LG
of
PT
HIA

25

velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse
+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas
+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA
velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser
velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser
–  From
LG
tables
to
NooJ
dictionaries

•  Integration
with
eSPERTo
dictionary
entries

①  Adjective
exists
in
Port4NooJ

ü  Port4NooJ
entries
blindly
receive
the
additional
properties
as
speciZied
by
the
LG
tables

q  In
a
second
round,
discard
at
least
entries
marked
with
+AB

Integration
of
LG
of
PT
HIA

26

velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse
velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser
velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser
–  From
LG
tables
to
NooJ
dictionaries

•  Integration
with
eSPERTo
dictionary
entries

①  Adjective
exists
in
Port4NooJ

ü  Port4NooJ
entries
blindly
receive
the
additional
properties
as
speciZied
by
the
LG
tables

q  In
a
second
round,
discard
at
least
entries
marked
with
+AB

Integration
of
LG
of
PT
HIA

27

–  From
LG
tables
to
NooJ
dictionaries

•  Integration
with
eSPERTo
dictionary
entries

②  Adjective
not
in
Port4NooJ
(or
in
but
is
derived
from
another
entry):

ü  FLX
code
is
assigned
automatically
given
the
ending
of
the
word

ü  Entries
are
checked
for
missing
FLX
codes
and
reviewed
by
a
linguist

ü  All
other
properties
come
from
LG
table

abissínio,A+FLX=ALTO+IH+Table=SAN+Nhum+Vcopser+Vcoptornarse+UMNclas
+UmModif+NclassPserde+NclassPorigem+NclassPnacionalidade
+NclassPnaturalidade+NAdj+Top=Abissínia+TopDET=a
(no
entry
in
Port4Nooj)

arranhado,A+FLX=ALTO+IH+Table=EAHP2+Nhum+NapdeNhum+Npc+Vcopestar
+AdvQuant+Superlativo+NAdj+NhumVopAPrepNap+deemEDefNap
+DRV=A2N4:BALÃO+DRV=A2V2:FALAR+Reflexivo
(In
Port4Nooj:
arranhar,V+FLX=FALAR...)

solteiro,A+FLX=ALTO+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopficar
+Vcoppermanecer+Vcopencontrarse+UMNclas+UmModif+Superlativo+NAdj

(In
Port4NooJ:
solteiro,N+FLX=ANO+AN+des+EN=bachelor)

Integration
of
LG
of
PT
HIA

28

–  From
LG
to
NooJ
grammars

•  Option
1:
Syntactic
Parsing

Integration
of
LG
of
PT
HIA

29

–  From
LG
to
NooJ
grammars

•  Option
1:
Syntactic
Parsing

Integration
of
LG
of
PT
HIA

30

Input1

Output1

o homem é tonto

REESCREVE+TEXTO=é um tonto

é tonto

/REESCREVE

–  From
LG
to
NooJ
grammars

•  Option
1:
Syntactic
Parsing

Integration
of
LG
of
PT
HIA

31

Input2

Output2

o homem é um tonto

REESCREVE+TEXTO=é tonto

é um tonto

/REESCREVE

–  From
LG
to
NooJ
grammars

•  Option
2:
Transformational
module

Integration
of
LG
of
PT
HIA

32

–  From
LG
to
NooJ
grammars

•  Option
2:
Transformational
module

Integration
of
LG
of
PT
HIA

33

Input1

Input2

ó

é tonto, REESCREVE+Cpred

é um tonto, REESCREVE+CCI

Preliminary
Results

34

•  5
150
human
intransitive
adjectives

•  677
new
derivational
paradigms

•  Example
grammars
for
the
syntactic
parser
and
the
transformational
module

•  50%
increase
in
Port4NooJ
adjective
entries

Preliminary
Results

35

•  5
150
human
intransitive
adjectives

•  677
new
derivational
paradigms

•  Example
grammars
for
the
syntactic
parser
and
the
transformational
module

•  50%
increase
in
Port4NooJ
adjective
entries
Table Example In,Port4NooJ New %,In
SAHP1 inteligente 303 247 55%
SAHP2 atlético 142 226 39%
SEAHP2 bonito 53 87 38%
SAHC1 idota 115 229 33%
SAHP3 culto 97 263 27%
SEAHP3 velho 32 93 26%
SEAHC2 gordo 14 41 25%
SAF anarquista 70 234 23%
SEAHC3 bêbado 15 53 22%
SEAD leproso 39 149 21%
EAHP3 zangado 54 213 20%
SAHC2 sedutor 41 177 19%
EAHP2 abatido 18 87 17%
SAN americano 108 544 17%
SAHC3 inculto 54 465 10%
Total 1155 3108 26%

•  Complete
the
integration
of
the
LG
of
human
intransitive
adjectives

–  By
creating
all
grammars
to
process
the
constructions
formalized
in
LG

•  Revise
and
evaluate
the
new
resources

•  Integrate
and
adapt
additional
LG
grammars:

–  Constructions
with
Vsup
ser
de
(Baptista,
2005)

–  Constructions
with
Vsup
fazer
(Chacoto,
2005)

•  Use
the
grammar
paraphrase
knowledge
to
create
a
corpus
of
paraphrases
to

develop
eSPERTo
hybrid
paraphrase
acquisition
engine

–  Train
machine
learning
paraphrase
acquisition
system

–  Annotate
semantico-‐syntactic
and
multiword
paraphrases
in
corpora
to
use
in

training
and
evaluation

–  Merge
of
paraphrases
collected
statistically

Next
Steps

36

37

Thank
you!

дзякуй!

cmota|francisco.afonso.raposo@ist.utl.pt

pcc|anabela.barreiro@inesc-‐id.pt

Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese

Recommended

Recommended

More Related Content

Similar to Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese

Similar to Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese (20)

More from INESC-ID (Spoken Language Systems Laboratory - L2F)

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Recently uploaded

Recently uploaded (20)

Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese