2024: Domino Containers - The Next Step. News from the Domino Container commu...
Selecting Proper Lexical Paraphrase for Children
1. Selecting Proper Lexical
Paraphrase for Children
Tomoyuki Kajiwara
Hiroshi Matsumoto
Kazuhide Yamamoto
Nagaoka University of Technology
2. Lexical Paraphrase for Children
Elementary school
Japanese dictionary
【大詰め:final stage】
The last scene of the play
芝居の最後の場面
Newspaper
for Children
Basic Vocabulary
to Learn
5,404 words
最後の大一番
Total annual number
of vocabulary
200,000 words Selected by the similarity
between the headword
Big match of the last
Newspaper
for Adults
大詰めの大一番
Big match of the final stage
2
3. BVL : Basic Vocabulary to Learn
Vocabulary that
registered in
the elementary
school dictionary
Vocabulary that
registered in the
general dictionary
Vocabulary that
elementary
school students
General Vocabulary
Vocabulary to Learn
25,000 words
can use
sufficient Vocabulary of
the minimum
necessary for
a living
3
Basic Vocabulary to Learn
5,404 words
Paraphrase to BVL
from GV and VL
Reading assistance for
elementary school students
Basic Vocabulary
2,000 words
4. Related Works
• Paraphrase of utilizing a dictionary
– headword → headword
• Fujita et al. (2000)、Mino and Tanaka (2011)
– headword → word from the end of
definition statement
• Kaji et al. (2002)、Mino and Tanaka (2011)、
Kajiwara and Yamamoto(2013)
”The definition statements are simpler than the headwords”
”The last segment represents the meaning of the headword”
4
5. Problem of Related Works
Definition
【 大詰め 】芝居の最後の場面
【final stage】the last scene of the party
Paraphrase
✕ 大詰めの大一番 → 場面の大一番
Big match of the final stage → Big match of the scene
✔ 大詰めの大一番 → 最後の大一番
Big match of the final stage → Big match of the last
Appropriate target words are not always
found at the end of definitions
5
7. Proposed Method(1/2)
• Acquisition of the Target Word Candidates
① Difficult word is extracted
② Entries of the difficult word are searched
③ Words are extracted
if they are the same part-of-speech as the difficult word
6
① ③
Original Sentence ・・・
People
professor ・・・
【professor】People of status as professor.
【professor】Status as professor.
【professor】Teach learning and skill.
【professor】University teacher.
Japanese
Dictionary
Status
Professor
Learning
Skill
University
Teacher
②
8. Proposed Method(2/2)
• Selection of the Proper Target Word
④ Simple words are extracted
⑤ Similarities of meaning are calculated
⑥ Simple word with the highest similarity is selected
7
Basic Vocabulary
to Learn
People
Learning
University
Skill
Teacher
People
Status
Professor
Learning
Skill
University
Teacher
:0.17
:0.11
:0.08
:0.13
:0.25
④ ⑤
⑥
10. Comparative Methods
• Acquisition of the Target Word Candidates
One word is extracted
From the end of definition statements
If it is the same part-of-speech as the difficult word
• Selection of the Proper Target Word
Weighted voting by following methods
• Frequency
• Co-occurrence frequency
• Point-wise Mutual Information
• Tri-gram frequency
• Cosine similarity between document vectors 8
11. Experimental Setup
• Experimental object : 152 difficult words
– Do not appear in BVL
– Appear more than 50 times
in the Mainichi News Paper published in 2000
– Include paraphrasable simple words
in the definition statements
• Dictionary : Three Japanese dictionary
• Thesaurus : Japanese WordNet
9
12. Procedure (1/2)
• Experiments on the 52 difficult words
– Decide weight
• Experiments on the 100 difficult words
– Weighted voting
• Evaluation
– Three evaluator are judged
– Decide by majority vote
– Definition of “paraphrasable”
The simple word can be replaced with
difficult word in the original sentence 10
13. Procedure (2/2)
③ Nouns are extracted
11
① Difficult word is extracted
Original Sentence ・・・
People
professor ・・・
② Entries of the professor are searched
【professor】People of status as professor.
【professor】Status as professor.
【professor】Teach learning and skill.
【professor】University teacher.
Japanese
Dictionary
Status
Professor
Learning
Skill
University
Teacher
Basic Vocabulary
to Learn
People
Learning
University
Skill
Teacher
People
Status
Professor
Learning
Skill
University
Teacher
:0.17
:0.11
:0.08
:0.13
:0.25
④ Simple words are extracted
⑤ Similarities of meaning are calculated
14. Result (1/3)
• Acquisition of the Target Word Candidates
– More paraphrasable simple words are acquired
– Only 3.2 points difference
Number of
paraphrasable words
Percentage of
paraphrasable words
Proposed 165 / 221 74.7 %
Comparative 158 / 221 71.5 %
Many paraphrasable simple words
appear at the end of definition statements
12
15. Result (2/3)
0 10 20 30 40 50 60 70
13
【Baseline】Randomness
【Proposed】WordNet-similarity
(1) Frequency
(2) Co-occurrence Frequency
(3) Point-wise Mutual Information
(4) Tri-gram frequency
(5) Cosine similarity
Acquisition by comparative method
Acquisition by proposed method
16. Result (3/3)
0 10 20 30 40 50 60 70
14
【Baseline】Randomness
【Proposed】WordNet-similarity
A) Weightless voting by comparative
methods (1)-(5)
B) Weighted voting by comparative
methods (1)-(5)
C) Weightless voting adds the
WordNet-similarity to the A)
D) Weighted voting adds the
WordNet-similarity to the B)
Acquisition by comparative method
Acquisition by proposed method
17. Erroneous Examples (1/2)
• Two or more simple words have the highest similarity
Example
• Original : A summary of the main points.
• Definition :【Points】essential, score, game, spot
essential
score
game
spot
The method utilizing frequency or context
information selected paraphrasable word
15
: similarity 1.0
: similarity 1.0
: similarity 1.0
: similarity 1.0
18. Erroneous Examples (2/2)
• The non-paraphrasable word have the highest similarity
Example
• Original : I can play the program during recording.
• Definition : 【Play】Use the garbage again. What
was gone once again regains power and life.
16
use : paraphrasable, similarity 0.8
power : non-paraphrasable, similarity 1.0
The method utilizing frequency or context
information selected paraphrasable word
19. Conclusion
We paraphrase difficult word to simple word with the
highest similarity using the whole definition statements
• Acquisition of the Target Word Candidates
– More paraphrasable simple words are acquired
– Many of them appear at the end of definitions
• Selection of the Proper Target Word
The selection based on the similarity is better than
the selection by frequency or context information
17