Little words - Big meanings (in MT syntactic transfer)
1. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences
English-Romanian Parallel Sequences with Syntactic Constituents
English Syntactic Sequences with FW
Little words - Big meanings
(in MT syntactic transfer)
Mihaela Colhon
University of Craiova
Departament of Computer Science
April 25, 2012
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
2. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences
English-Romanian Parallel Sequences with Syntactic Constituents
English Syntactic Sequences with FW
Table of contents
English-Romanian Phrase Alignment
Romanian treebank
Function words = syntactic glue for sentences
English Functional Words Sequences
Romanian Syntactic Sequences
English-Romanian Parallel Sequences with Syntactic Constituents
ADVP, CC, CD, DT
DT(cont.), IN
IN(cont.), JJ, MD
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
TO, VBG, VP, WDT, WRB
English Syntactic Sequences with FW
[DT NN NN]
[IN/as, NP]
[IN/at, NP]
[IN/by, NP]
[IN/for, NP]
[IN/of, NP]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
3. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences
Romanian treebank
English-Romanian Parallel Sequences with Syntactic Constituents
English Syntactic Sequences with FW
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
4. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences
Romanian treebank
English-Romanian Parallel Sequences with Syntactic Constituents
English Syntactic Sequences with FW
Accuracy: 87% (for the Romanian part of English-Romanian parallel
treebank) compared with the Romanian chunker annotations.
Token word Treebank tags/chunker annotations Number of matches
Ncms−n VP VP NP VP VP S
vot Np Pp
no match
Rgp ADVP VP S ROOT
de− asemenea Ap
one match
Afpms−n ADJP NP NP VP ...
economic Ap Np Pp
two matches
Ncfp−n NP PP VP S ROOT
dividende Np Pp
two matches
Spsa PP VP PP S
ˆ
ın Ap Vp Pp
three matches
Table : Example of parallel sequences of treebank tags and chunker annotations together with their matching
degrees
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
5. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
6. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
In any syntactic structure we can identify two major categories of
words:
Content words which identify objects, entities, properties,
relationships or events and syntactically are represented by
nouns, adjectives, verbs and adverbs.
Functional words that help putting words together in a
correct structural sentence form. Also, the functional words
can tell how words are related to each other. The functional
words can be determiners, quantifier, prepositions or
connectives.
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
7. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
From the English-Romanian Parallel Treebank with Syntactic
Constituents, 2120 English Functional Words Constructions
together with their translations in Romanian were extracted.
English Functional words = words that in Penn POS Tagset
formalism have one of the following tags: CC, DT, IN, MD,
PRP, PP$, RP, TO, WDT, WP, WP$, WRB.
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
8. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
From the English-Romanian Parallel Treebank with Syntactic
Constituents, 2120 English Functional Words Constructions
together with their translations in Romanian were extracted.
English Functional words = words that in Penn POS Tagset
formalism have one of the following tags: CC, DT, IN, MD,
PRP, PP$, RP, TO, WDT, WP, WP$, WRB.
English syntactic constructions with functional words:
[ { Phrasal− Tag }∗ Pos− Tag/FW { Phrasal− Tag}∗ ]
where by FW we note a functional word
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
9. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
From the English-Romanian Parallel Treebank with Syntactic
Constituents, 2120 English Functional Words Constructions
together with their translations in Romanian were extracted.
English Functional words = words that in Penn POS Tagset
formalism have one of the following tags: CC, DT, IN, MD,
PRP, PP$, RP, TO, WDT, WP, WP$, WRB.
English syntactic constructions with functional words:
[ { Phrasal− Tag }∗ Pos− Tag/FW { Phrasal− Tag}∗ ]
where by FW we note a functional word
Examples:
[NP, PRN, CC/and, NP]
[RB, JJ, CC/and, JJ]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
10. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
Following the same representations, the corresponding
Romanian translations of the English Functional Words
Constructions are encoded in the same format.
Romanian Functional Words = words that in
MULTEXT-EAST Tagset formalism have one of the following
tags: Pd− , Pi− , Ps− , Px− , Pz− , D− , T− , S− , C− , Q− .
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
11. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
Following the same representations, the corresponding
Romanian translations of the English Functional Words
Constructions are encoded in the same format.
Romanian Functional Words = words that in
MULTEXT-EAST Tagset formalism have one of the following
tags: Pd− , Pi− , Ps− , Px− , Pz− , D− , T− , S− , C− , Q− .
Romanian syntactic constructions:
[ { Phrasal− Tag }∗ MULTEXT-EastTag/FW { Phrasal− Tag}∗ ]
where by FW we note a functional word
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
12. English-Romanian Phrase Alignment
Function words = syntactic glue for sentences English Functional Words Sequences
English-Romanian Parallel Sequences with Syntactic Constituents Romanian Syntactic Sequences
English Syntactic Sequences with FW
Following the same representations, the corresponding
Romanian translations of the English Functional Words
Constructions are encoded in the same format.
Romanian Functional Words = words that in
MULTEXT-EAST Tagset formalism have one of the following
tags: Pd− , Pi− , Ps− , Px− , Pz− , D− , T− , S− , C− , Q− .
Romanian syntactic constructions:
[ { Phrasal− Tag }∗ MULTEXT-EastTag/FW { Phrasal− Tag}∗ ]
where by FW we note a functional word
Examples:
[Di3-po—e/altor, NP]
[VP, Crssp/¸i, Tsfs/a, NP, PUNCT]
s
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
13. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Figure : The resulted parallel sequences were saved into a DataBase with
four fields: SynPhrase En, SynPhrase RO, Treebank EN, Treebank RO.
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
14. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
15. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
16. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
17. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
18. ADVP, CC, CD, DT
English-Romanian Phrase Alignment
DT(cont.), IN
Function words = syntactic glue for sentences
IN(cont.), JJ, MD
English-Romanian Parallel Sequences with Syntactic Constituents
MD(cont.), NN, NNS, NP, PDT, PP, PRP, RB, S, SBAR
English Syntactic Sequences with FW
TO, VBG, VP, WDT, WRB
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
19. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
DT + NN → N− − − − y (y: definiteness)
DT + NN → T − − − − +N − − − − n (T: article)
DT + NN → D − − − − − − − − − − + N − − − − n (D: determiner)
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
20. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
Treebank EN Treebank RO
[PP [IN Rsp/as] [NP [NP Afp/strict] [ADJP [RB [PP [Rw 14/cˆt] [Rp 15/mai] [ADJP [Afpfp-n
a
Cs/as] [JJ Afp/possible]]]] 16/stricte] [ADJP [Rgp 17/posibil]]]]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
21. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
Treebank EN Treebank RO
[PP [IN Sp/at] [NP [RBS Pi3-p/most]]] [PP [Rgp 3/maximum]]
[PP [IN Sp/at] [NP [NP [DT Dd/the] [NN Ncns/end]] [NP [Spsa 1/la] [NP [NP [Ncfsry 2/ˆ
ıncheierea]] [NP
[PP [IN Sp/of] [NP [DT Dd/the] [JJ Afp/financial] [Ncmsoy 3/exercitiului] [ADJP [Afpms-n 4/finan-
¸
[NN Ncns/year]]]]] ciar]]]]]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
22. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
Treebank EN Treebank RO
[PP [IN Sp/by] [NP [DT Dd/the] [NN Np/director- [NP [Spca 7/de− c˘tre] [NP [Ncmsry 8/directorul]
a
general]]] [Afpms-n 9/general]]]
[PP [IN Sp/by] [NP [NP [DT Dd/the] [NN [NP [Spca 14/de− c˘tre]
a [NP [NP [Ncfsrn
Ncns/agency]] ...] 15/Agentie]] ...]
¸
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
23. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)
24. [DT NN NN]
English-Romanian Phrase Alignment [IN/as, NP]
Function words = syntactic glue for sentences [IN/at, NP]
English-Romanian Parallel Sequences with Syntactic Constituents [IN/by, NP]
English Syntactic Sequences with FW [IN/for, NP]
[IN/of, NP]
Mihaela Colhon University of Craiova Departament of Computer Science
Little words - Big meanings (in MT syntactic transfer)