12. 例文
12
The ship may have sunk but the movie didn't!!! Director, James Cameron, from 'The Terminator' did
it again with this amazing picture. One of my favorite scenes is 'The Dinner table' scene, in which
Rose's family and friends meet Jack after he saves her. Rose has a look on her face that every
woman should have when you meet 'THE ONE'...I hope I have that look when I am in the room with
my future husband.<br /><br />Jack and Rose have a connection that is 'MOVIE STUFF' but it's good
movie stuff. We have the greedy mom and all her elite stuck up associates who live off of their
husbands wealth. Rose almost commits suicide but the Gilbert Grape star rescues her. I really liked
the hanging over the boat scene. It was a good risk.<br /><br />The movie is long but it's fantastic!!!
Good story, good flow, good actors!!! Go see it twice if you want, Its worth it!!!
出典:
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011).
Learning Word Vectors for Sentiment Analysis.
The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
18. 例文
1. This movie is terrible. It’s a waste of time.
2. This movie was good and made me happy. Had a very good time.
3. This movie is just boring.
1. [movie, terrible, waste, time]
2. [movie, good, made, happy, good, time]
3. [movie, boring]
18
前処理
19. 出現回数 (bag of words)
• テキストごとの各単語の出現回数をそのまま特徴量とする
19
boring good happy made movie terrible time waste
Text 1 0 0 0 0 1 1 1 1
Text 2 0 2 1 1 1 0 1 0
Text 3 1 0 0 0 1 0 0 0
20. 出現頻度(TF: Term Frequency)
• テキストごとの各単語の出現回数を全単語数で割った値を
特徴量とする
• 𝑇𝐹𝑑,𝑤 =
テキスト 𝑑における単語 𝑤の出現回数
テキスト 𝑑の全単語数
20
boring good happy made movie terrible time waste
Text 1 0.00 0.00 0.00 0.00 0.25 0.25 0.25 0.25
Text 2 0.00 0.33 0.17 0.17 0.17 0.00 0.17 0.00
Text 3 0.50 0.00 0.00 0.00 0.50 0.00 0.00 0.00
21. TF-IDF (IDF: Inverse Document Frequency)
• TFに対し、ほぼすべてのテキストに出現するような
「レア度の低い」単語の重みを下げる
• 𝐼𝐷𝐹𝑤 = log
全テキスト数
単語 𝑤が出現するテキスト数
• 𝑇𝐹𝐼𝐷𝐹𝑑,𝑤 = 𝑇𝐹𝑑,𝑤 × 𝐼𝐷𝐹𝑤
21
boring good happy made movie terrible time waste
IDF 0.48 0.48 0.48 0.48 0.00 0.48 0.30 0.48
Text 1 0.00 0.00 0.00 0.00 0.00 0.12 0.08 0.12
Text 2 0.00 0.16 0.08 0.08 0.00 0.00 0.05 0.00
Text 3 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00
23. word2vec
• Distributed Representation of
Words and Phrases and their
Compositionality
(Mikolov et al., 2013)
• 共起する(近くに現れる)単
語を学習させる
• ベクトルどうしの関係性もベ
クトルで表せる
• Paris – France + Japan = ?
23
24. その他の単語埋め込みモデル
• GloVe: Global Vectors for Word Representation
• Pennington et al., 2014
• BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
• Devlin, et al. 2018
24