SlideShare a Scribd company logo
1 of 42
Download to read offline
Kazuki TAKIGAWA and Kazuhide YAMAMOTO
Department of Electrical Engineering
Nagaoka University of Technology, JAPAN
{takigawa,yamamoto}@jnlp.org
1
Background
• bag-of-words
o It is difficult to see sense of an expression.
ex.)“かける[kakeru]” has some meaning; “do up”,”put
on”, “take out” and so on.
• word n-gram
o It is often creates unnecessary elements.
ex.) ”で-ある-こと[de-aru-koto](3-gram)”
A processing unit which can keep meaning
of expression is needed.
Mainly processing units have
some problems in Japanese
2
• bag-of-words
o It is difficult to see sense of an expression.
o ex.) 「かける」という単語
• word n-gram
o It is often creates unnecessary elements.
ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」
A processing unit which can keep meaning
of expression is needed.
Mainly processing units in NLP
We propose
“syntactic piece”.
3
Background
• Syntactic piece is a minimum unit of syntactic structure.
• It consists of a pair of modifier and modificand, derived from
syntactic structure.
• This pair is expressed as: modifier → modificand
Recently, immediate noise is very big.
(最近まわりの騒音がとても大きい)
recently big
最近→大きい
Syntactic Piece
What’s Syntactic Piece?
very big
とても→大きい
immediate noise
まわりの→騒音
noise is big
騒音が→大きい 4
Advantages of Syntactic Piece
Very simple
It is easy to use, just like n-gram.
It has syntactic structure
It contains more information than n-
gram.
Similar to phrasal idiom
It can deal with a chunk of meaning.
5
Advantages of Syntactic Piece
Very simple
It is easy to use, just like n-gram.
It has syntactic structure
It contains more information than n-
gram.
Similar to phrasal idiom
It can deal with a chunk of meaning.
But syntactic piece has some problems.
6
1) Length of syntactic piece tends to be long
because syntactic piece is pair of phrase. So if
we use syntactic piece, then we get many unique
expressions.
2) Some phrase pairs not have meaning are
included in the phrase pair generated by current
method.
Problem of Syntactic Piece
We suggest solution of these problems.
7
Method(1)
- Generalization of Same Class Expressions -
We generalize “same class expressions” for decreasing
unique expressions.
“Same class expressions” means a set of expressions
which have similar meaning even if the surface is different.
1.cake is delicious (ケーキ-が→おいしい)
2.delicious cake (おいしい→ケーキ)
In these two expressions, the surface structure is different.
But the meaning of both expression are very similar.
These expressions, we call “same class expressions”.
8
Method(1)
- Generalization of Same Class Expressions -
We generalize same class expressions.
Same class expressions have two criteria.
9
Method(1)
- Generalization of Same Class Expressions -
(1)Syntactic pieces constructed by adjective and noun
with the same contents words.
noun(-particle) → adjective
adjective → noun
騒音-が → 大きい
noise is big
大きい → 騒音
big noise
We generalize same class expressions.
Same class expressions have two criteria.
10
Method(1)
- Generalization of Same Class Expressions -
(2) Syntactic pieces constructed by verb and noun with
the same contents words.
noun(-particle) → verb
verb → noun
子供-が → 楽しむ
a child rejoice
楽しむ → 子供
rejoicing child
We generalize same class expressions.
Same class expressions have two criteria.
11
Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
Some phrase pairs not have meaning.
12
be satisfied
満足する→ こと(mannzoku-suru koto)
I can be
こと-が → できる(koto-ga dekiru)
be satisfied
満足する→ こと(mannzoku-suru koto)
Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
Modification relation is
nothing.
Some phrase pairs not have meaning.
13
I can be
こと-が → できる(koto-ga dekiru)
Method(2)
- Coping with form word -
I can be satisfied.(満足することができる)
be satisfied
満足する→ こと(mannzoku-suru koto)
I can be
こと-が → できる(koto-ga dekiru)
Modification relation is
nothing.
Any meaning is
nothing.
Some phrase pairs not have meaning.
14
Method(2)
- Coping with form word -
The reason of this problem is that it is treated
“こと[koto]” as “form word”.
Form word is a type of content word, but it is
diminished original meaning and used formally in
Japanese.
This is similar to relation pronoun such as “which”,
“who”, ”when” etc. in English.
15
• We collected form words by manual.
• We treat the phrase having form word as
function word for before content word.
be satisfied
満足する → こと
I can be
こと-が → できる
I can be satisfied very much.(とても満足することができる)
satisfied very much
とても → 満足する
16
Method(2)
- Coping with form word -
conventional syntactic piece
be satisfied
満足する → こと
I can be
こと-が → できる
I can be satisfied
満足すること-が → できる
I can be satisfied very much.(とても満足することができる)
satisfied very much
とても → 満足する
17
Method(2)
- Coping with form word -
• We collected form words by manual.
• We treat the phrase having form word as
function word for before content word.
copying with form word
Application to Sentiment Analysis
• We apply to sentiment analysis for verifying effectivity of
improved syntactic piece.
• Target of sentiment analysis is a sentence, and a sentence
is classified into positive, negative, or other.
1. A pair of evaluative expression and semantic orientation
score (SO-score) are registered in a dictionary.
in this: evaluative expression = syntactic piece
2.Each expression in input sentence is given SO-score from
the dictionary.
3.A sentence is classified by summation of SO-score.
18
noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
19
noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
Syntactic pieces are obtained from input.
20
noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
input
dictionary
Sentence Classification
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
noise of fan
(ファン-の → 騒音)
input:
negative
Obtained syntactic piece and word(s) of a
dictionary are matched.
21
noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
obtained syntactic piece
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
Sentence Classification
noise of fan
(ファン-の → 騒音)
dictionary
matching
noise is big(騒音が大きい)
negative
We can treat that “noise is big” is negative.
input:
negative
22
noise of fan is big. (ファンの騒音が大きい。)
noise of fan
(ファン-の → 騒音)
noise is big
(騒音-が → 大きい)
noise is big(騒音が大きい)
negative
obtained syntactic piece
matching
noise is big:negative
(騒音-が → 大きい)
SO of syntactic Piece
input
dictionary
Sentence Classification
noise of fan
(ファン-の → 騒音)
input:
negative
SO of input is negative.
23
Reason for Applying Sentiment Analysis
• This method uses a dictionary, so If we have SO-score
of an expression:“noise is big”, then we can give SO-score
from “big noise” by same class expressions.
• There should not be an expression which does not
have meaning in a dictionary, such as “I can be” is
“positive” by coping with form word.
24
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
positive
sentences
negative
sentences
seed
dictionary
training data
25
positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
We prepare positive and negative
sentences as training data.
training data
26
positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
Syntactic pieces are obtained from training data,
and calculated frequency.
training data
27
positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
training data
Each syntactic piece are given SO-score, and we
treat the result of this as seed dictionary.
28
positive
sentences
negative
sentences
Preparation for Sentence Classification
- Making of Seed Dictionary -
syntactic piece positive negative
size is big 5 1
slow to respond 0 8
softly-colored 3 0
・
・
seed
dictionary
training dataSO-score is calculated by probability of occurrence.
(Fujimura et al.[04])
29
Each syntactic piece are given SO-score, and we
treat the result of this as seed dictionary.
Evaluation expression is more, the better.
For this, we need huge training data.
It is costly to prepare by manual.
We want to get training data automatically.
So we make expanded dictionary.
Preparation for Sentence Classification
- Expansion of Dictionary -
30
new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
large
scale
corpus
positive
negative
31
We obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
Sentences from corpus are classified positive
and negative by seed dictionary.
We treat the result of this as new training data.
new training data
seed
dictionary
large
scale
corpus
positive
negative
32
new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
expanded
dictionary
large
scale
corpus
positive
positive
Syntactic pieces are obtained from new training
data, and calculated frequency like making a
seed dictionary.
33
new training dataWe obtain syntactic piece
Preparation for Sentence Classification
- Expansion of Dictionary -
seed
dictionary
syntactic piece positive negative
continuing is difficult 0 5
good design 8 0
to be gift 5 1
・
・
large
scale
corpus
positive
positive
Also semantic orientation score, and we treat the
result of this as expanded dictionary.
expanded
dictionary
34
Experiment
• We manually prepared;
● approximately 2,000 positive sentences
● approximately 1,000 negative sentences
● approximately 210,000 sentences as large scale
corpus for expansion
• We analyzed sentiment using the following methods for
efficacy examination of each of our methods.
(1) Using only generalization of same class expressions
(2) Using only coping with form word
(3) Combination of (1) and (2)
(4) Using conventional syntactic piece (for baseline)
35
Result
78.747.7(3) (1)+(2)
75.547.1(4) Baseline
77.344.6(2) only coping with Form
word
77.149.8(1) only generalization of
same class expressions
precision(%)recall(%)language processing units
・We can confirm the improvement of precision by all
methods more than baseline.
・We can also improve recall in generalization of same class
expressions.
36
Discussion
- Generalization of Same Class Expression -
• It turned out high in recall than baseline.
We could give the semantic orientation score
to more sentences, and scale of the
expansion dictionary is increased.
We could get approximately 14,000 sentences
(approximately 5.7% of increase) as new training
data greater than conventional syntactic piece.
37
Discussion
- Coping with form word -
38
We tried to solve the problem of extraction of
phrase pair which does not have meaning.
In the result, some sentences that accidentally
became the correct answer using conventional
syntactic piece.
In the dictionary using conventional syntactic piece
• “Think that(なる,と → 思う[naru-to → omou])” is given
positive score.
This expression does not have semantic orientation.
Our method can treat semantic
orientation of each expression.
In the dictionary using our method
• “think to be cumber(邪魔になる-と → 思う[jama ni naru-
to → omou])” is given negative score.
• “think to become a present(プレゼントになる-と → 思う
[present ni naru-to omou])” is given positive score.
Discussion
- Coping with form word -
39
79.978.8word 2-gram
78.075.3word 3-gram
77.149.8Using same class
expressions
precision(%)recall(%)language processing units
Recall is lower than word 2-gram and word 3-gram.
Discussion
- Comparison with other language processing unit -
40
Conclusion
• We suggested two methods for improvement of
syntactic piece.
• We applied sentiment analysis to verify effectivity
of improved syntactic piece.
• As a result, recall and precision of improved
syntactic piece increased than conventional one.
• It is inferior as compared with word 2-gram or 3-
gram.
• In future works we intend to improve recall.
41
Thank you.
42

More Related Content

What's hot

Phrase structure rules
Phrase structure rulesPhrase structure rules
Phrase structure ruleseka sutarmi
 
Constituents and phrases
Constituents  and  phrasesConstituents  and  phrases
Constituents and phrasesAimz Crisostomo
 
Constituency, Trees and Rules
Constituency, Trees and Rules Constituency, Trees and Rules
Constituency, Trees and Rules Eman Al Husaiyan
 
Phrase structure rule
Phrase structure rulePhrase structure rule
Phrase structure ruleSila Chaniago
 
Phrase Structure Rules
Phrase Structure RulesPhrase Structure Rules
Phrase Structure RulesAna Vieyra
 
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Dr. Shadia Banjar
 
Structure of simple sentence (সহজে ইংরেজি শিখার মূলমন্ত্র)
Structure of simple sentence (সহজে ইংরেজি  শিখার মূলমন্ত্র) Structure of simple sentence (সহজে ইংরেজি  শিখার মূলমন্ত্র)
Structure of simple sentence (সহজে ইংরেজি শিখার মূলমন্ত্র) Saint Dodo
 
Syntax –part 2
Syntax –part 2Syntax –part 2
Syntax –part 2zouhirgabsi
 
Syntax –part 3
Syntax –part 3Syntax –part 3
Syntax –part 3zouhirgabsi
 
Analysis of Grammatical Structure
Analysis of Grammatical StructureAnalysis of Grammatical Structure
Analysis of Grammatical StructureChad Eller
 
Structural ambiguity
Structural ambiguityStructural ambiguity
Structural ambiguityAsif Ali Raza
 
M1 lesson 2.1 slides
M1 lesson 2.1 slidesM1 lesson 2.1 slides
M1 lesson 2.1 slidesAnh Le
 
M1 lesson 1.2 slides
M1 lesson 1.2 slidesM1 lesson 1.2 slides
M1 lesson 1.2 slidesAnh Le
 
Beginning syntax 1
Beginning syntax 1Beginning syntax 1
Beginning syntax 1Jesus Payan
 
Syntax & Stylistics 2
Syntax & Stylistics 2Syntax & Stylistics 2
Syntax & Stylistics 2Rick McKinnon
 

What's hot (20)

Recursion
RecursionRecursion
Recursion
 
Phase structure
Phase structurePhase structure
Phase structure
 
Phrase structure rules
Phrase structure rulesPhrase structure rules
Phrase structure rules
 
Constituents and phrases
Constituents  and  phrasesConstituents  and  phrases
Constituents and phrases
 
Syntax
SyntaxSyntax
Syntax
 
Constituency, Trees and Rules
Constituency, Trees and Rules Constituency, Trees and Rules
Constituency, Trees and Rules
 
Phrase structure rule
Phrase structure rulePhrase structure rule
Phrase structure rule
 
Phrase Structure Rules
Phrase Structure RulesPhrase Structure Rules
Phrase Structure Rules
 
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
 
Structure of simple sentence (সহজে ইংরেজি শিখার মূলমন্ত্র)
Structure of simple sentence (সহজে ইংরেজি  শিখার মূলমন্ত্র) Structure of simple sentence (সহজে ইংরেজি  শিখার মূলমন্ত্র)
Structure of simple sentence (সহজে ইংরেজি শিখার মূলমন্ত্র)
 
Syntax –part 2
Syntax –part 2Syntax –part 2
Syntax –part 2
 
Syntax –part 3
Syntax –part 3Syntax –part 3
Syntax –part 3
 
Analysis of Grammatical Structure
Analysis of Grammatical StructureAnalysis of Grammatical Structure
Analysis of Grammatical Structure
 
Structural ambiguity
Structural ambiguityStructural ambiguity
Structural ambiguity
 
Ammara
AmmaraAmmara
Ammara
 
M1 lesson 2.1 slides
M1 lesson 2.1 slidesM1 lesson 2.1 slides
M1 lesson 2.1 slides
 
M1 lesson 1.2 slides
M1 lesson 1.2 slidesM1 lesson 1.2 slides
M1 lesson 1.2 slides
 
Beginning syntax 1
Beginning syntax 1Beginning syntax 1
Beginning syntax 1
 
SYNTAX - head and modifiers
SYNTAX - head and modifiersSYNTAX - head and modifiers
SYNTAX - head and modifiers
 
Syntax & Stylistics 2
Syntax & Stylistics 2Syntax & Stylistics 2
Syntax & Stylistics 2
 

Similar to Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis

Body Language - Based on Work Idioms Using Body Parts.pptx
Body Language - Based on Work Idioms Using Body Parts.pptxBody Language - Based on Work Idioms Using Body Parts.pptx
Body Language - Based on Work Idioms Using Body Parts.pptxEnglish Online Inc.
 
Material logistics process_improvement
Material logistics process_improvementMaterial logistics process_improvement
Material logistics process_improvementRAVIENSLATE
 
Noun phrase for EFL classes
Noun phrase for EFL classesNoun phrase for EFL classes
Noun phrase for EFL classesAmi Parmilah
 
english-grammar
english-grammarenglish-grammar
english-grammarcjsmann
 
04 Verbs
04 Verbs04 Verbs
04 Verbsaptwano
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Artificial Intelligence_NLP
Artificial Intelligence_NLPArtificial Intelligence_NLP
Artificial Intelligence_NLPThenmozhiK5
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxprakashvs7
 
Reduced relative clauses 2022.pdf
Reduced relative clauses 2022.pdfReduced relative clauses 2022.pdf
Reduced relative clauses 2022.pdfMariaSol64
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of SpeechAl_Waseem
 
Toefl structure & r eading
Toefl structure & r eadingToefl structure & r eading
Toefl structure & r eadingNargis Hasso
 
Toeflstructurereading 150503091317-conversion-gate01
Toeflstructurereading 150503091317-conversion-gate01Toeflstructurereading 150503091317-conversion-gate01
Toeflstructurereading 150503091317-conversion-gate01Ana Dahlia
 
Setswana Noun Analyzer and Generator
Setswana Noun Analyzer and GeneratorSetswana Noun Analyzer and Generator
Setswana Noun Analyzer and GeneratorCSCJournals
 
english grammar tenses pdf.pdf
english grammar tenses pdf.pdfenglish grammar tenses pdf.pdf
english grammar tenses pdf.pdfMetroEducationHub
 

Similar to Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis (20)

Body Language - Based on Work Idioms Using Body Parts.pptx
Body Language - Based on Work Idioms Using Body Parts.pptxBody Language - Based on Work Idioms Using Body Parts.pptx
Body Language - Based on Work Idioms Using Body Parts.pptx
 
Syntax
SyntaxSyntax
Syntax
 
Material logistics process_improvement
Material logistics process_improvementMaterial logistics process_improvement
Material logistics process_improvement
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Noun phrase for EFL classes
Noun phrase for EFL classesNoun phrase for EFL classes
Noun phrase for EFL classes
 
english-grammar
english-grammarenglish-grammar
english-grammar
 
04 Verbs
04 Verbs04 Verbs
04 Verbs
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
syntax
 syntax syntax
syntax
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Artificial Intelligence_NLP
Artificial Intelligence_NLPArtificial Intelligence_NLP
Artificial Intelligence_NLP
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
 
Reduced relative clauses 2022.pdf
Reduced relative clauses 2022.pdfReduced relative clauses 2022.pdf
Reduced relative clauses 2022.pdf
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of Speech
 
Toefl structure & r eading
Toefl structure & r eadingToefl structure & r eading
Toefl structure & r eading
 
Toeflstructurereading 150503091317-conversion-gate01
Toeflstructurereading 150503091317-conversion-gate01Toeflstructurereading 150503091317-conversion-gate01
Toeflstructurereading 150503091317-conversion-gate01
 
Setswana Noun Analyzer and Generator
Setswana Noun Analyzer and GeneratorSetswana Noun Analyzer and Generator
Setswana Noun Analyzer and Generator
 
english grammar tenses pdf.pdf
english grammar tenses pdf.pdfenglish grammar tenses pdf.pdf
english grammar tenses pdf.pdf
 
Syntax.english 12
Syntax.english 12 Syntax.english 12
Syntax.english 12
 

More from 長岡技術科学大学 自然言語処理研究室

More from 長岡技術科学大学 自然言語処理研究室 (20)

小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
 
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for Children
 
Automatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge ExpressionAutomatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge Expression
 
用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察
 
用言等換言辞書の構築
用言等換言辞書の構築用言等換言辞書の構築
用言等換言辞書の構築
 
質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類
 
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
 
対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳
 
用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました
 
文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得
 
「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作
 
常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討
 
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
 
二格深層格の定量的分析
二格深層格の定量的分析二格深層格の定量的分析
二格深層格の定量的分析
 
大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得
 
文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案
 
保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出
 
Developing User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text AnalyzerDeveloping User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text Analyzer
 
普通名詞換言辞書の構築
普通名詞換言辞書の構築普通名詞換言辞書の構築
普通名詞換言辞書の構築
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Syntactic Piece: Idea, Purpose and Application to Sentiment Analysis

  • 1. Kazuki TAKIGAWA and Kazuhide YAMAMOTO Department of Electrical Engineering Nagaoka University of Technology, JAPAN {takigawa,yamamoto}@jnlp.org 1
  • 2. Background • bag-of-words o It is difficult to see sense of an expression. ex.)“かける[kakeru]” has some meaning; “do up”,”put on”, “take out” and so on. • word n-gram o It is often creates unnecessary elements. ex.) ”で-ある-こと[de-aru-koto](3-gram)” A processing unit which can keep meaning of expression is needed. Mainly processing units have some problems in Japanese 2
  • 3. • bag-of-words o It is difficult to see sense of an expression. o ex.) 「かける」という単語 • word n-gram o It is often creates unnecessary elements. ex.) 「が,かける(2-gram)」「で,ある,こと(3-gram)」 A processing unit which can keep meaning of expression is needed. Mainly processing units in NLP We propose “syntactic piece”. 3 Background
  • 4. • Syntactic piece is a minimum unit of syntactic structure. • It consists of a pair of modifier and modificand, derived from syntactic structure. • This pair is expressed as: modifier → modificand Recently, immediate noise is very big. (最近まわりの騒音がとても大きい) recently big 最近→大きい Syntactic Piece What’s Syntactic Piece? very big とても→大きい immediate noise まわりの→騒音 noise is big 騒音が→大きい 4
  • 5. Advantages of Syntactic Piece Very simple It is easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. 5
  • 6. Advantages of Syntactic Piece Very simple It is easy to use, just like n-gram. It has syntactic structure It contains more information than n- gram. Similar to phrasal idiom It can deal with a chunk of meaning. But syntactic piece has some problems. 6
  • 7. 1) Length of syntactic piece tends to be long because syntactic piece is pair of phrase. So if we use syntactic piece, then we get many unique expressions. 2) Some phrase pairs not have meaning are included in the phrase pair generated by current method. Problem of Syntactic Piece We suggest solution of these problems. 7
  • 8. Method(1) - Generalization of Same Class Expressions - We generalize “same class expressions” for decreasing unique expressions. “Same class expressions” means a set of expressions which have similar meaning even if the surface is different. 1.cake is delicious (ケーキ-が→おいしい) 2.delicious cake (おいしい→ケーキ) In these two expressions, the surface structure is different. But the meaning of both expression are very similar. These expressions, we call “same class expressions”. 8
  • 9. Method(1) - Generalization of Same Class Expressions - We generalize same class expressions. Same class expressions have two criteria. 9
  • 10. Method(1) - Generalization of Same Class Expressions - (1)Syntactic pieces constructed by adjective and noun with the same contents words. noun(-particle) → adjective adjective → noun 騒音-が → 大きい noise is big 大きい → 騒音 big noise We generalize same class expressions. Same class expressions have two criteria. 10
  • 11. Method(1) - Generalization of Same Class Expressions - (2) Syntactic pieces constructed by verb and noun with the same contents words. noun(-particle) → verb verb → noun 子供-が → 楽しむ a child rejoice 楽しむ → 子供 rejoicing child We generalize same class expressions. Same class expressions have two criteria. 11
  • 12. Method(2) - Coping with form word - I can be satisfied.(満足することができる) Some phrase pairs not have meaning. 12 be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru)
  • 13. be satisfied 満足する→ こと(mannzoku-suru koto) Method(2) - Coping with form word - I can be satisfied.(満足することができる) Modification relation is nothing. Some phrase pairs not have meaning. 13 I can be こと-が → できる(koto-ga dekiru)
  • 14. Method(2) - Coping with form word - I can be satisfied.(満足することができる) be satisfied 満足する→ こと(mannzoku-suru koto) I can be こと-が → できる(koto-ga dekiru) Modification relation is nothing. Any meaning is nothing. Some phrase pairs not have meaning. 14
  • 15. Method(2) - Coping with form word - The reason of this problem is that it is treated “こと[koto]” as “form word”. Form word is a type of content word, but it is diminished original meaning and used formally in Japanese. This is similar to relation pronoun such as “which”, “who”, ”when” etc. in English. 15
  • 16. • We collected form words by manual. • We treat the phrase having form word as function word for before content word. be satisfied 満足する → こと I can be こと-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 16 Method(2) - Coping with form word - conventional syntactic piece
  • 17. be satisfied 満足する → こと I can be こと-が → できる I can be satisfied 満足すること-が → できる I can be satisfied very much.(とても満足することができる) satisfied very much とても → 満足する 17 Method(2) - Coping with form word - • We collected form words by manual. • We treat the phrase having form word as function word for before content word. copying with form word
  • 18. Application to Sentiment Analysis • We apply to sentiment analysis for verifying effectivity of improved syntactic piece. • Target of sentiment analysis is a sentence, and a sentence is classified into positive, negative, or other. 1. A pair of evaluative expression and semantic orientation score (SO-score) are registered in a dictionary. in this: evaluative expression = syntactic piece 2.Each expression in input sentence is given SO-score from the dictionary. 3.A sentence is classified by summation of SO-score. 18
  • 19. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative 19
  • 20. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative Syntactic pieces are obtained from input. 20
  • 21. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching input dictionary Sentence Classification noise is big:negative (騒音-が → 大きい) SO of syntactic Piece noise of fan (ファン-の → 騒音) input: negative Obtained syntactic piece and word(s) of a dictionary are matched. 21
  • 22. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) obtained syntactic piece noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input Sentence Classification noise of fan (ファン-の → 騒音) dictionary matching noise is big(騒音が大きい) negative We can treat that “noise is big” is negative. input: negative 22
  • 23. noise of fan is big. (ファンの騒音が大きい。) noise of fan (ファン-の → 騒音) noise is big (騒音-が → 大きい) noise is big(騒音が大きい) negative obtained syntactic piece matching noise is big:negative (騒音-が → 大きい) SO of syntactic Piece input dictionary Sentence Classification noise of fan (ファン-の → 騒音) input: negative SO of input is negative. 23
  • 24. Reason for Applying Sentiment Analysis • This method uses a dictionary, so If we have SO-score of an expression:“noise is big”, then we can give SO-score from “big noise” by same class expressions. • There should not be an expression which does not have meaning in a dictionary, such as “I can be” is “positive” by coping with form word. 24
  • 25. Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ positive sentences negative sentences seed dictionary training data 25
  • 26. positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary We prepare positive and negative sentences as training data. training data 26
  • 27. positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary Syntactic pieces are obtained from training data, and calculated frequency. training data 27
  • 28. positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training data Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary. 28
  • 29. positive sentences negative sentences Preparation for Sentence Classification - Making of Seed Dictionary - syntactic piece positive negative size is big 5 1 slow to respond 0 8 softly-colored 3 0 ・ ・ seed dictionary training dataSO-score is calculated by probability of occurrence. (Fujimura et al.[04]) 29 Each syntactic piece are given SO-score, and we treat the result of this as seed dictionary.
  • 30. Evaluation expression is more, the better. For this, we need huge training data. It is costly to prepare by manual. We want to get training data automatically. So we make expanded dictionary. Preparation for Sentence Classification - Expansion of Dictionary - 30
  • 31. new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive negative 31
  • 32. We obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary Sentences from corpus are classified positive and negative by seed dictionary. We treat the result of this as new training data. new training data seed dictionary large scale corpus positive negative 32
  • 33. new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ expanded dictionary large scale corpus positive positive Syntactic pieces are obtained from new training data, and calculated frequency like making a seed dictionary. 33
  • 34. new training dataWe obtain syntactic piece Preparation for Sentence Classification - Expansion of Dictionary - seed dictionary syntactic piece positive negative continuing is difficult 0 5 good design 8 0 to be gift 5 1 ・ ・ large scale corpus positive positive Also semantic orientation score, and we treat the result of this as expanded dictionary. expanded dictionary 34
  • 35. Experiment • We manually prepared; ● approximately 2,000 positive sentences ● approximately 1,000 negative sentences ● approximately 210,000 sentences as large scale corpus for expansion • We analyzed sentiment using the following methods for efficacy examination of each of our methods. (1) Using only generalization of same class expressions (2) Using only coping with form word (3) Combination of (1) and (2) (4) Using conventional syntactic piece (for baseline) 35
  • 36. Result 78.747.7(3) (1)+(2) 75.547.1(4) Baseline 77.344.6(2) only coping with Form word 77.149.8(1) only generalization of same class expressions precision(%)recall(%)language processing units ・We can confirm the improvement of precision by all methods more than baseline. ・We can also improve recall in generalization of same class expressions. 36
  • 37. Discussion - Generalization of Same Class Expression - • It turned out high in recall than baseline. We could give the semantic orientation score to more sentences, and scale of the expansion dictionary is increased. We could get approximately 14,000 sentences (approximately 5.7% of increase) as new training data greater than conventional syntactic piece. 37
  • 38. Discussion - Coping with form word - 38 We tried to solve the problem of extraction of phrase pair which does not have meaning. In the result, some sentences that accidentally became the correct answer using conventional syntactic piece. In the dictionary using conventional syntactic piece • “Think that(なる,と → 思う[naru-to → omou])” is given positive score. This expression does not have semantic orientation.
  • 39. Our method can treat semantic orientation of each expression. In the dictionary using our method • “think to be cumber(邪魔になる-と → 思う[jama ni naru- to → omou])” is given negative score. • “think to become a present(プレゼントになる-と → 思う [present ni naru-to omou])” is given positive score. Discussion - Coping with form word - 39
  • 40. 79.978.8word 2-gram 78.075.3word 3-gram 77.149.8Using same class expressions precision(%)recall(%)language processing units Recall is lower than word 2-gram and word 3-gram. Discussion - Comparison with other language processing unit - 40
  • 41. Conclusion • We suggested two methods for improvement of syntactic piece. • We applied sentiment analysis to verify effectivity of improved syntactic piece. • As a result, recall and precision of improved syntactic piece increased than conventional one. • It is inferior as compared with word 2-gram or 3- gram. • In future works we intend to improve recall. 41