The Codex of Business Writing Software for Real-World Solutions 2.pptx
Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis
1. Kazuki TAKIGAWA and Kazuhide YAMAMOTO
Department of Electrical Engineering
Nagaoka University of Technology, JAPAN
{takigawa,yamamoto}@jnlp.org
1
2. Background
• bag-of-words
o It is difficult to see sense of an expression.
ex.)“かける[kakeru]” has some meaning; “do up”,”put
on”, “take out” and so on.
• word n-gram
o It is often creates unnecessary elements.
ex.) ”で-ある-こと[de-aru-koto](3-gram)”
A processing unit which can keep meaning
of expression is needed.
Mainly processing units have
some problems in Japanese
2
3. • bag-of-words
o It is difficult to see sense of an expression.
o ex.) 「かける」という単語
• word n-gram
o It is often creates unnecessary elements.
ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」
A processing unit which can keep meaning
of expression is needed.
Mainly processing units in NLP
We propose
“syntactic piece”.
3
Background
4. • Syntactic piece is a minimum unit of syntactic structure.
• It consists of a pair of modifier and modificand, derived from
syntactic structure.
• This pair is expressed as: modifier → modificand
Recently, immediate noise is very big.
(最近まわりの騒音がとても大きい)
recently big
最近→大きい
Syntactic Piece
What’s Syntactic Piece?
very big
とても→大きい
immediate noise
まわりの→騒音
noise is big
騒音が→大きい 4
5. Advantages of Syntactic Piece
Very simple
It is easy to use, just like n-gram.
It has syntactic structure
It contains more information than n-
gram.
Similar to phrasal idiom
It can deal with a chunk of meaning.
5
6. Advantages of Syntactic Piece
Very simple
It is easy to use, just like n-gram.
It has syntactic structure
It contains more information than n-
gram.
Similar to phrasal idiom
It can deal with a chunk of meaning.
But syntactic piece has some problems.
6
7. 1) Length of syntactic piece tends to be long
because syntactic piece is pair of phrase. So if
we use syntactic piece, then we get many unique
expressions.
2) Some phrase pairs not have meaning are
included in the phrase pair generated by current
method.
Problem of Syntactic Piece
We suggest solution of these problems.
7
8. Method(1)
- Generalization of Same Class Expressions -
We generalize “same class expressions” for decreasing
unique expressions.
“Same class expressions” means a set of expressions
which have similar meaning even if the surface is different.
1.cake is delicious (ケーキ-が→おいしい)
2.delicious cake (おいしい→ケーキ)
In these two expressions, the surface structure is different.
But the meaning of both expression are very similar.
These expressions, we call “same class expressions”.
8
9. Method(1)
- Generalization of Same Class Expressions -
We generalize same class expressions.
Same class expressions have two criteria.
9
10. Method(1)
- Generalization of Same Class Expressions -
(1)Syntactic pieces constructed by adjective and noun
with the same contents words.
noun(-particle) → adjective
adjective → noun
騒音-が → 大きい
noise is big
大きい → 騒音
big noise
We generalize same class expressions.
Same class expressions have two criteria.
10
11. Method(1)
- Generalization of Same Class Expressions -
(2) Syntactic pieces constructed by verb and noun with
the same contents words.
noun(-particle) → verb
verb → noun
子供-が → 楽しむ
a child rejoice
楽しむ → 子供
rejoicing child
We generalize same class expressions.
Same class expressions have two criteria.
11
12. Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
Some phrase pairs not have meaning.
12
be satisfied
満足する→ こと(mannzoku-suru koto)
I can be
こと-が → できる(koto-ga dekiru)
13. be satisfied
満足する→ こと(mannzoku-suru koto)
Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
Modification relation is
nothing.
Some phrase pairs not have meaning.
13
I can be
こと-が → できる(koto-ga dekiru)
14. Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
be satisfied
満足する→ こと(mannzoku-suru koto)
I can be
こと-が → できる(koto-ga dekiru)
Modification relation is
nothing.
Any meaning is
nothing.
Some phrase pairs not have meaning.
14
15. Method(2)
- Coping with form word -
The reason of this problem is that it is treated
“こと[koto]” as “form word”.
Form word is a type of content word, but it is
diminished original meaning and used formally in
Japanese.
This is similar to relation pronoun such as “which”,
“who”, ”when” etc. in English.
15
16. • We collected form words by manual.
• We treat the phrase having form word as
function word for before content word.
be satisfied
満足する → こと
I can be
こと-が → できる
I can be satisfied very much.(とても満足することができる)
satisfied very much
とても → 満足する
16
Method(2)
- Coping with form word -
conventional syntactic piece
17. be satisfied
満足する → こと
I can be
こと-が → できる
I can be satisfied
満足すること-が → できる
I can be satisfied very much.(とても満足することができる)
satisfied very much
とても → 満足する
17
Method(2)
- Coping with form word -
• We collected form words by manual.
• We treat the phrase having form word as
function word for before content word.
copying with form word
18. Application to Sentiment Analysis
• We apply to sentiment analysis for verifying effectivity of
improved syntactic piece.
• Target of sentiment analysis is a sentence, and a sentence
is classified into positive, negative, or other.
1. A pair of evaluative expression and semantic orientation
score (SO-score) are registered in a dictionary.
in this: evaluative expression = syntactic piece
2.Each expression in input sentence is given SO-score from
the dictionary.
3.A sentence is classified by summation of SO-score.
18
19. noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
19
20. noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
Syntactic pieces are obtained from input.
20
21. noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
input
dictionary
Sentence Classification
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
noise of fan
(ファン-の → 騒音)
input:
negative
Obtained syntactic piece and word(s) of a
dictionary are matched.
21
22. noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
obtained syntactic piece
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
Sentence Classification
noise of fan
(ファン-の → 騒音)
dictionary
matching
noise is big(騒音が大きい)
negative
We can treat that “noise is big” is negative.
input:
negative
22
23. noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
SO of input is negative.
23
24. Reason for Applying Sentiment Analysis
• This method uses a dictionary, so If we have SO-score
of an expression:“noise is big”, then we can give SO-score
from “big noise” by same class expressions.
• There should not be an expression which does not
have meaning in a dictionary, such as “I can be” is
“positive” by coping with form word.
24
25. Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
positive
sentences
negative
sentences
seed
dictionary
training data
25
26. positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
We prepare positive and negative
sentences as training data.
training data
26
27. positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
Syntactic pieces are obtained from training data,
and calculated frequency.
training data
27
28. positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
training data
Each syntactic piece are given SO-score, and we
treat the result of this as seed dictionary.
28
29. positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
training dataSO-score is calculated by probability of occurrence.
(Fujimura et al.[04])
29
Each syntactic piece are given SO-score, and we
treat the result of this as seed dictionary.
30. Evaluation expression is more, the better.
For this, we need huge training data.
It is costly to prepare by manual.
We want to get training data automatically.
So we make expanded dictionary.
Preparation for Sentence Classification
- Expansion of Dictionary -
30
31. new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
large
scale
corpus
positive
negative
31
32. We obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
Sentences from corpus are classified positive
and negative by seed dictionary.
We treat the result of this as new training data.
new training data
seed
dictionary
large
scale
corpus
positive
negative
32
33. new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
large
scale
corpus
positive
positive
Syntactic pieces are obtained from new training
data, and calculated frequency like making a
seed dictionary.
33
34. new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
large
scale
corpus
positive
positive
Also semantic orientation score, and we treat the
result of this as expanded dictionary.
expanded
dictionary
34
35. Experiment
• We manually prepared;
● approximately 2,000 positive sentences
● approximately 1,000 negative sentences
● approximately 210,000 sentences as large scale
corpus for expansion
• We analyzed sentiment using the following methods for
efficacy examination of each of our methods.
(1) Using only generalization of same class expressions
(2) Using only coping with form word
(3) Combination of (1) and (2)
(4) Using conventional syntactic piece (for baseline)
35
36. Result
78.747.7(3) (1)+(2)
75.547.1(4) Baseline
77.344.6(2) only coping with Form
word
77.149.8(1) only generalization of
same class expressions
precision(%)recall(%)language processing units
・We can confirm the improvement of precision by all
methods more than baseline.
・We can also improve recall in generalization of same class
expressions.
36
37. Discussion
- Generalization of Same Class Expression -
• It turned out high in recall than baseline.
We could give the semantic orientation score
to more sentences, and scale of the
expansion dictionary is increased.
We could get approximately 14,000 sentences
(approximately 5.7% of increase) as new training
data greater than conventional syntactic piece.
37
38. Discussion
- Coping with form word -
38
We tried to solve the problem of extraction of
phrase pair which does not have meaning.
In the result, some sentences that accidentally
became the correct answer using conventional
syntactic piece.
In the dictionary using conventional syntactic piece
• “Think that(なる,と → 思う[naru-to → omou])” is given
positive score.
This expression does not have semantic orientation.
39. Our method can treat semantic
orientation of each expression.
In the dictionary using our method
• “think to be cumber(邪魔になる-と → 思う[jama ni naru-
to → omou])” is given negative score.
• “think to become a present(プレゼントになる-と → 思う
[present ni naru-to omou])” is given positive score.
Discussion
- Coping with form word -
39
40. 79.978.8word 2-gram
78.075.3word 3-gram
77.149.8Using same class
expressions
precision(%)recall(%)language processing units
Recall is lower than word 2-gram and word 3-gram.
Discussion
- Comparison with other language processing unit -
40
41. Conclusion
• We suggested two methods for improvement of
syntactic piece.
• We applied sentiment analysis to verify effectivity
of improved syntactic piece.
• As a result, recall and precision of improved
syntactic piece increased than conventional one.
• It is inferior as compared with word 2-gram or 3-
gram.
• In future works we intend to improve recall.
41