Concept Cloud-based Sentiment Visualization for Financial Reviews
1. Concept Cloud-based
Sentiment Visualization for
Financial Reviews
Tomoki Ito*, Kota Tsubouchi**, Hiroki Sakaji*,
Tatsuo Yamashita**, Kiyoshi Izumi*
** Graduate School of Engineering, The University of Tokyo
* Yahoo Japan Corporation
2. Back ground
• Online reviews are useful for decision making in the
investment.
• e.g., micro-blogs, SNS, and news articles…
3. Difficulty in Reading Large Volume of Reviews
• To read all the posts should not be practical
• the volume of the posts is sometimes very large.
4. Difficulty in Reading Large Volume of Reviews
• To read all the posts should not be practical
• the volume of the posts is sometimes very large.
• Framework for visualizing the summary of the
financial reviews is necessary
5. What is important for decision
making in the investment ?
• In the decision making process, the following
two types of sentiments are important
• Word-level sentiment score
• Concept-level sentiment score
6. What is important for decision
making in the investment ?
• In the decision making process, the following
two types of sentiments are important
• Word-level sentiment score
• Concept-level sentiment score
7. Word-level sentiment
• Word-level sentiment means the sentiment scores in
word units
In total, we are in a bull market.
0.1 0.1 0.2 0.0 0.3 1.2 -0.1
8. Word-level sentiment
• Word-level sentiment means the sentiment scores in
word units
• We should consider the context in this score assignment
• e.g., sentiment shift by “not”
In total, we are in a bull market.
0.1 0.1 0.2 0.0 0.3 1.2 -0.1
In total, we are not in a bull market.
0.1 0.1 0.2 0.0 0.3 -0.1 -0.1
9. What is important for decision
making in the investment ?
• In the decision making process, the following
two types of sentiments are important
• Word-level sentiment score
• Concept-level sentiment score
10. Concept-level Sentiment
• Concept-level sentiment means the sentiment
scores in concept units
• Concept means a set of similar words
Up, Down, ↗︎↗︎
Delicious,
Nasty,
Palatable, …
Clean, Dirty
Trend: 0.5 Taste: -0.1
Cleanness: -0.2
Reviews
11. What is important for decision
making in the investment ?
• In the decision making process, the following
two types of sentiments are important
• Word-level sentiment score
• Concept-level sentiment score
12. Purpose
• This study aims to develop a method for visualizing
• Word-level sentiment score, and
• Concept-level sentiment score
at the same time in a user-friendly way
13. Our Approach
• We propose a novel text-visualization framework
called CCSV
Concept Cloud-based Sentiment
Visualization
14. CCSV Example
• Using CCSV, we can summarize reviews as follows
(The price was inversed www)
(The price was inversed.)
(Go down below 3000yen.
I cannot buy now.)
・
・
・
(Over 1000 reviews in five days)
15. CCSV Example
• Using CCSV, we can summarize reviews as follows
text-visualization results for a set of reviews for trading company X in
September 25th, 2017 and September 30th, 2017 extracted from the
Yahoo Financial Micro-blogs.
• Color
• Red: Positive
Blue:
Negative
• Size: Volume of
Sentiment
16. Contribution
Our Contribution is summarized as follows
• We propose a novel text-visualization framework
called CCSV
• We experimentally evaluated the validity of the
CSCV using real dataset
17. Contribution
Our Contribution is summarized as follows
• We propose a novel text-visualization framework
called CCSV
• We experimentally evaluated the validity of the
CCSV
18. Concept Cloud-based Sentiment
Visualization
• CCSV is constructed from the following three parts
1. Word-level sentiment Extraction
2. Concept-level sentiment Extraction
3. Word and Concept-level sentiment Visualization
19. Concept Cloud-based Sentiment
Visualization
• CCSV is constructed from the following three parts
1. Word-level sentiment Extraction
2. Concept-level sentiment Extraction
3. Word and Concept-level sentiment Visualization
20. Word-level sentiment Extraction
• This step addresses the following contextual
word-level sentiment score assignment task
Input: In total, we are not in a bull market.
In total, we are not in a bull market.
( Sentiment influence: polarity of “bull” is shifted by not)
In total, we are not in a bull market.
(Red and blue words has positive and negative sentiments, respectively )
Original Word-level Sentiment (sentiment score before considering contexts):
Contextual Word-level Sentiment (sentiment score after considering contexts)
21. Task Setting
• This step aims to assign word-level sentiment scores
using only a text corpus dataset including reviews
and their positive or negative sentiment tags
In total, we are in a bull market.
Review:
Tag: Positive
• We decided this task setting considering the
practicality
Text corpus dataset
22. Previous approach in Word-level
sentiment Extraction
• Previous works[Vo 2016, Li 2017] address this task by
automatically developing a word sentiment score
dictionary
• However, they cannot consider contexts
Input: In total, we are not in a bull market.
In total, we are not in a bull market.
Cannot consider contexts
23. Our approach
• We solve this task by estimating
• P (•) : Original word-level sentiment
• R (•) : Contextual word-level sentiment
using the LRP method [L. Arras. et. al., 2017] with the RNN model
Document dataset: {Di}N
i=1 where Di = {wt
i }N
i=1
Sentiment tag:
24. Our approach
• We solve this challenge by estimating
• P (•) : Original word-level sentiment
• R (•) : Contextual word-level sentiment
using the LRP method [L. Arras. et. al., 2017] with the RNN model
Document dataset: {Di}N
i=1 where Di = {wt
i }N
i=1
Sentiment tag:
26. LRP-based Approach Process
• We estimate R (•) as follows
1. Develop a RNN model with LSTM cells using Text corpus
dataset including document and their positive or negative
sentiment tags
27. LRP-based Approach Process
• We estimate R (•) as follows
1. Develop a RNN model with LSTM cells using Text corpus
dataset including document and their positive or negative
sentiment tags
2. Estimate
• R (•) : Contextual word-level sentiment
using the LRP method [L. Arras. et. al., 2017] with the RNN model
In total, we are in a bull market.
R (•) : LRP + RNN (LSTM)
0.1 0.1 0.2 0.0 0.3 1.2 -0.1
28. Layer-wise Representation Propagation(LRP)
• LRP is the method for interpreting Neural Networks
• LRP calculates the relevance score of the input value to the output
value
• LRP can be used in the RNN with LSTM cells
• the relevance score of each term from the LRP with the RNN is
expected to consider contexts
market
is
bull
Positive
Negative
OutputInput
(L. Arras. et. al., 2017)
28
29. Concept Cloud-based Sentiment
Visualization
• CCSV is constructed from the following three parts
1. Word-level sentiment Extraction
• using the LRP method
2. Concept-level sentiment Extraction
• with the word-level sentiments and K-means
clustering method
3. Word and Concept-level sentiment Visualization
• using Word Cloud method
31. Concept Cloud-based Sentiment
Visualization
• CCSV is constructed from the following three parts
1. Word-level sentiment Extraction
• using the LRP method
2. Concept-level sentiment Extraction
• with the word-level sentiments and K-means
clustering method
3. Word and Concept-level sentiment Visualization
• using Word Cloud method
32. Word and Concept-level sentiment Visualization
• This step visualize the word-level and concept-level
sentiment scores using Tag Cloud Approach
Up: 0.5
Down: -0.2
↗︎↗︎ : 0.6
delicious: 0.6
nasty: -0.7
Palatable: +1.5
Clean: +0.3
Dirty: -0.2
0.9
+1.4
+0.1
Up Down ↗︎ ↗︎
Deliciousnasty
Palatable
Clean
Dirty
• Color
• Red: Positive Blue: Negative
• Size: Volume of Sentiment
33. Contribution
Our Contribution is summarized as follows
• We propose a novel text-visualization framework
called CCSV
• We experimentally evaluated the validity of the
CCSV
34. Experimental Evaluation
• We evaluated our method from two aspects using
real textual datasets
• Original Sentiment assignment property
• Contextual sentiment assignment property
35. Dataset
• We evaluated the validity of our approach using the following
dataset
• Text Corpus
• Economic dataset: Current economy watchers survey
• Train: 20,000 positive posts and 20,000 negative posts
• Valid: 2,000 positive posts and 2,000 negative posts
• Test: 4,000 positive posts and 4,000 negative posts
• Yahoo dataset: Yahoo Finance micro-blogs between
September
• Train: 30,612 positive posts and 9,388 negative posts
• Valid: 3,387 positive posts and 1,613 negative posts
• Test: 7,538 positive posts and 2,462 negative posts
36. Experimental Evaluation
• We evaluated our method from three aspects using
real textual datasets
• Original Sentiment assignment property
• Contextual sentiment assignment property
37. Original Sentiment assignment property
• How accurately P (•) presents the
positive or negative polarity of each
term in the word polarity list
• Economic word polarity list
• 348 positive and 391 negative words
• We used this list when we estimated P (•)
using the Economic dataset
• Yahoo word polarity list
• 422 positive and 372 negative words
• We used this list when we estimated P (•)
using the Yahoo dataset
Good: Positive
Bad: Negative
Great: Positive
Bullish: Positive
・
・
・
Word polarity list
38. Comparison Method
• We compared our method with the following
comparison methods
• Word-level sentiment score assignment methods
• PMI
• FLW [D. T. Vo et. al., 2016]
• SONN [Q. Li et. al., 2017]
40. Experimental Evaluation
• We evaluated our method from three aspects using
real textual datasets
• Original Sentiment assignment property
• Contextual sentiment assignment property
41. Contextual Sentiment assignment property
• How accurately the sum of the contextual word-level sentiment
scores in a term of each review in test dataset presents the
positive or negative polarity of the review
In total, we are in a bull market.
R (•) :LRP + RNN (LSTM)
0.1 0.1 0.2 0.0 0.3 1.2 -0.1
0.1 + 0.1 + 0.2 + 0,0 + 0.3 + 1.2 + -0.1 = 1.8
Positive Accurate ?
42. Comparison Method
• We compared our method with the following
comparison methods
• Word-level sentiment score assignment methods
• PMI
• FLW [D. T. Vo et. al., 2016]
• SONN [Q. Li et. al., 2017]
• LR: Logistic Regression
• RNN with LSTM cells
44. CCSV Example
• Using CCSV, we can summarize reviews as follows
Text-visualization results for a set of reviews for trading company X in September 25th,
2017 and September 30th, 2017 extracted from the Yahoo Financial Micro-blogs.
Useful for
decision making
process in
investment
• Color
Red: Positive
Blue: Negative
• Size: Volume of
Sentiment
45. Conclusion
• Summary
• We propose a novel text-visualization
framework called CCSV
• We experimentally evaluated the validity of
the CCSV
• Future work
• We will modify the CCSV more user-friendly
• We will apply this approach to other languages
52. Layer-wise Representation Propagation(LRP)
• Calculate the relevance score of the input value to the output value by
• starting from the output layer of the neural network and
• backpropagating this quantity up to the input layer.
market
is
bull
Positive
Negative
OutputInput
(L. Arras. et. al., 2017)
52
55. LRP-based Approach Process
• We estimate R (•) as follows
1. Develop a RNN model with LSTM cells using Text corpus
dataset including document and their positive or negative
sentiment tags
In total, we are in a bull market.
Review:
Tag: Positive
Text corpus dataset
56. Previous approach in Word-level
sentiment Extraction
• Previous works[Vo 2016, Li 2017] address this task by
automatically developing a word sentiment score
dictionary
• However, they cannot consider contexts
Input: In total, we are not in a bull market.
In total, we are not in a bull market.
Cannot consider contexts