2. Visiting a new city….
Online Opinions Which hotel to stay at?
2
3. Visiting a new city….
Online Opinions What attractions to visit?
Without opinions, decision making becomes
difficult!
3
4. ODSS Components
1. Data
Comprehensive set of opinions to support search and analysis
capabilities
4. Presentation
Putting it all altogether- easy way for users to explore results of search
and analysis components (ex. organizing and summarizing results)
3. Search Capabilities
Ability to find entities using
existing opinions
Focus of Existing
Work
opinion summarization
structured summaries
1. Sentiment Summary
(ex. +ve/-ve on a piece of text)
2. Fine-grained Sentiment Summ.
(ex. Battery life: 2 stars; Audio: 1 star)
2. Analysis Tools
Tools to help digest opinions
(ex. Summaries, Opinion trend
visualization)
Not a complete solution to support
decision making based on opinions !
4
5. ODSS Components
1. Data
Comprehensive set of opinions to support search and analysis
capabilities
4. Presentation
Putting it all altogether- easy way for users to explore results of search
and analysis components (ex. organizing and summarizing results)
3. Search Capabilities
Ability to find entities using
existing opinions
Focus of Existing
Work
opinion summarization
structured summaries
1. Sentiment Summary
(ex. +ve/-ve on a piece of text)
2. Fine-grained Sentiment Summ.
(ex. Battery life: 2 stars; Audio: 1 star)
2. Analysis Tools
Tools to help digest opinions
(ex. Summaries, Opinion trend
visualization)
Need to address broader set of problems
to enable opinion driven decision support
5
6. We need data: large number of online
opinions
Allow users to get complete and unbiased picture
▪ Opinions are very subjective and can vary a lot
Currently: No study on how to systematically
collect opinions from the web
7. We need different analysis tools
To help users analyze & digest opinions
▪ Sentiment trend visualization
▪ fluctuation over time
▪ Aspect level sentiment summaries
▪ Textual summaries, etc…
Currently: focus on structured summarization
8. We need to incorporate search
Allow users find different items or entities based
on existing opinions
This can improve user productivity cuts down
on the time spent on reading large number
opinions
9. We also need to know how to organize &
present opinions at hand effectively
Aspect level summaries:
▪ How to organize these summaries?
▪ Scores or Visuals (stars)?
▪ Do you show supporting phrases?
Full opinions:
▪ How to allow effective browsing of reviews/opinions?
don’t overwhelm users
10. ODSS Components
1. Data
Comprehensive set of opinions to support opinion based
search & analysis tasks
3. Search Capabilities
Find items/entities based on
existing opinions
(ex. show “clean” hotels only)
4. Presentation
Organizing opinions to support effective decision making
2. Analysis Tools
Tools to help analyze & digest
opinions (ex. Summaries, Opinion
trend visualization)
10
11. 1. Should be general
Works across different domains & possibly content
type
2. Should be practical & lightweight
Can be integrated into existing applications
Can potentially scale up to large amounts of data
11
13. Currently: No direct way of finding entities
based on online opinions
Need to read opinions about different entities
to find entities that fulfill personal criteria
13
Time consuming & impairs user
productivity!
14. Use existing opinions to rank entities based on
a set of unstructured user preferences
Finding a hotel: “clean rooms, good service”
Finding a restaurant: “authentic food, good ambience”
14
15. Use results of existing opinion mining methods
Find sentiment ratings on different aspects
Rank entities based on discovered aspect ratings
Problem: Not practical!
Costly - mine large amounts of textual content
Need prior knowledge on set of queriable aspects
Most existing methods rely on supervision
▪ E.g. Overall user rating
15
16. Use existing text retrieval models for ranking
entities based on preferences:
Can scale up to large amounts of textual content
Can be tweaked
Do not require costly IE or text mining
16
17. Investigate use of text retrieval models for Opinion-
Based Entity Ranking
Compare 3 state-of-the-art retrieval models:
BM25, PL2, DirichletLM – shown to work best for TR tasks
Which one works best for this ranking task?
Explore some extensions over existing IR models
Can ranking improve with these extensions?
Compile the first test set & propose evaluation
method for this new ranking task
17
19. Standard retrieval cannot distinguish multiple
preferences in query
E.g. Query: “clean rooms, cheap, good service”
Treated as long keyword query but actually 3 preferences
Problem: An entity may score highly because of matching
one aspect extremely well
To address this problem:
Score each preference separately – multiple queries
Combine the results of each query – different strategies
▪ Score combination works best
▪ Average rank
▪ Min rank
▪ Max rank
19
20. In standard retrieval: Matching an opinion
word & standard topic word is not
distinguished
Opinion-Based Entity Ranking:
Important to match opinion words in the query
▪ opinion words have more variation than topic words
▪ E.g. Great: excellent, good, fantastic, terrific…
Intuition:
▪ Expand a query with similar opinion words
▪ Help emphasize matching of opinions
20
22. 0.0%
2.0%
4.0%
6.0%
8.0%
PL2 LM BM25
QAM QAM + OpinExp
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
PL2 LM BM25
QAM QAM + OpinExp
Hotels Cars
Improvement using QAM
Improvement using QAM + OpinExp
QAM: Any model
can be used
QAM: Any model
can be used
22
25. Current methods: Focus on
generating structured
summaries of opinions
[Lu et al., 2009; Lerman et al., 2009;..]
Opinion Summary for iPod
26. We need supporting textual
summaries!
To know more: read many
redundant sentences
Opinion Summary for iPod
27. Summarize the major opinions
What are the major complaints/praise in the text?
Concise
◦ Easily digestible
◦ Viewable on smaller screen
Readable
◦ Easily understood
27
28. Widely studied for years
[Radev et al.2000; Erkan & Radev, 2004; Mihalcea & Tarau, 2004…]
Not suitable for generating concise summaries
Bias: with limit on summary size
▪ Selected sentences may have missed critical info.
Verbose: Not shortening sentences
We need more of an abstractive approach
29. 2 Abstractive Summarization
Methods
Opinosis
-Graph based summarization framework
-Relies on structural redundancies in sentences
WebNgram
-Optimization framework based on readability
& representativeness scoring
-Phrases generated by combining words in
original text
29
31. my
the iphone is a
phone calls frequently
too
with
.
drop
Step 1: Generate
graph representation of
text (Opinosis-Graph)
great
device
Input
Set of sentences:
Topic specific
POS annotated
31
32. Step 2: Find promising paths
(candidate summaries) &
score the candidates
my
the iphone is a
phone calls frequently
too
with
.
drop
Step 1: Generate
graph representation of
text (Opinosis-Graph)
great
device
Input
Set of sentences:
Topic specific
POS annotated
calls frequentlydrop
great device
candidate sum1
candidate sum2
3.2
2.5
32
33. The iPhone is a great
device, but calls drop
frequently.
Step 3: Select top scoring
candidates as final summary
calls frequentlydrop
great device
Step 2: Find promising paths
(candidate summaries) &
score the candidates
candidate sum1
candidate sum2
3.2
2.5
my
the iphone is a
phone calls frequently
too
with
.
drop
Step 1: Generate
graph representation of
text (Opinosis-Graph)
great
device
Input
Set of sentences:
Topic specific
POS annotated
33
34. Assume:
2 sentences about “call quality of iphone”
1. My phone calls drop frequently with the iPhone.
2. Great device, but the calls drop too frequently.
34
35. • One node for each unique word + POS combination
• Sid and Pid maintained at each node
• Edges indicate relationship between words in sentence 35
great
2:1
device
2:2
,
2:3
but
2:4
.
1:9, 2:10
my
1:1
phone
1:2
drop
1:4, 2:7
frequently
1:5, 2:9 with
1:6
the
1:7, 2:5
iphone
1:8
calls
1:3, 2:6
too
2:8
41. Calls drop frequently with the iPhone
Calls drop frequently with the Black Berry
drop frequently with the iphonecalls
black berry
One common high
redundancy path
High fan-out
“calls drop frequently with the iphone and
black berry”
41
42. Input:
Topic specific sentences from user reviews
Evaluation Measure:
Automatic ROUGE evaluation
42
45. Use existing words in original text to generate
micropinion summaries- set of short phrases
Emphasis on 3 aspects:
Compactness - use as few words as possible
Representativeness – reflect major opinions in text
Readability – fairly well formed
45
52. Measure used:
Standard Jaccard Similarity Measure
Why important?
Allows user to control amount of redundancy
E.g. User desires good coverage of information on
small device request less redundancies !
52
53. Purpose: Measure how well a phrase represents
opinions from the original text?
2 properties of a highly representative phrase:
1. Words should be strongly associated in text
2. Words should be sufficiently frequent in text
Captured by a modified pointwise mutual
information (PMI) function
53
)()(
),(),(
log)(' 2,
ji
jiji
ji
wpwp
wwcwwp
wwpmi
Add frequency of
occurrence within
a window
54. Purpose: Measure well-formedness of a phrase
Readability scoring:
Use Microsoft's Web N-gram model (publicly available)
Obtain conditional probabilities of phrases
Intuition: A readable phrase would occur more
frequently according to the web than a non-readable
phrase
)|(log
1
)( 1...12... kqk
n
qk
knkread wwwp
K
wwS
54
chain rule to compute
joint probability in terms of
conditional probabilities
(averaged)
55. Input:
User reviews for 330 products (CNET)
Evaluation Measure:
Automatic ROUGE evaluation
55
56. 0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
5 10 15 20 25 30
ROUGE-2RECALL
Summary Size (max words)
KEA
Tfidf
Opinosis
WebNGram
WebNgram: Performs
the best for this task
KEA: slightly
better than tfidfTfidf: Worst
performance
56
57. 0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
5 10 15 20 25 30
ROUGE-2RECALL
Summary Size (max words)
KEA
Tfidf
Opinosis
WebNGram
WebNgram: Performs
the best for this task
KEA: slightly
better than tfidfTfidf: Worst
performance
57
PROS
CONS
FULL REVIEW
59. No easy way to obtain a comprehensive set of
opinions about an entity
Where to get opinions now?
Rely on content providers or crawl a few sources
Problem :
▪ Can result in source specific bias
▪ Data sparseness for some entities
59
60. 60
Automatically crawl online reviews for
arbitrary entities
E.g. Cars, Restaurants, Doctors
Target online reviews represent a big
portion of online opinions
61. Meant to collect pages relevant to a topic
E.g. “Databases Systems”, “Boston Terror Attack”
Page type is not as important content
news article, review pages, forum page, etc.
Most focused crawlers are supervised
require large amounts of training data for each topic
Not suitable for review collection on arbitrary
entities
Need training data for each entity will not scale up to
large # of entities
61
62. Focused crawler for collecting reviews pages
on arbitrary entities
Unsupervised approach
Does not require large amounts of training data
Solves crawling problem efficiently
Uses a special data structure for relevance scoring
63. Set of entities in a domain
(e.g. All hotels in a city)
Step 1: For each entity, obtain
initial set of Candidate Review
Pages (CRP).
Find Initial Candidate
Review Pages (CRP)
Input
1. Hampton Inn Champaign…
2. I Hotel Conference Center…
3. La Quinta Inn Champaign…
4. Drury Inn
5. ….
Hampton Inn…Reviews
63
64. Step 3: Score CRPs:
• Entity relevance (Sent)
• Review pg. relevance (Srev)
Select: Srev > σrev ; Sent > σent
Expand CRP List
tripadvisor.com/Hotel_Review-g36806-d903...
tripadvisor.com/Hotels-g36806-Urbana_Cha...
hamptoninn3.hilton.com/en/hotels/…
tripadvisor.com/ShowUserReviews-g36806-...
......…
tripadvisor.com/Hotel_Review-g35790-d102…
tripadvisor.com/Hotels-g36806-Urbana_Cha...
hamptoninn3.hilton.com/en/hotels/…
tripadvisor.com/ShowUserReviews-g36806-...
......…
Step 2: Expand list of CRPs by
exploring links in neighborhood
of initial CRPs.
Collect Relevant Review Pages
64
65. Use any general web search (e.g. Bing/Google)
Per entity basis
Search engines do partial matching of entities to
pages
More likely pages in vicinity of search results
related to entity
QueryEntity Query
Format: “entity name + brand / address” + “reviews”
E.g. “Hampton Inn Champaign 1200 W University Ave Reviews”
65
66. Follow top-N URLs around vicinity of search
results
Use URL prioritization strategy:
Bias crawl path towards entity related pages
Score each URL: based on similarity between
(a) URL + Entity Query, Sim(URL,EQ)
(b) Anchor + Entity Query, Sim(Anchor,EQ)
66
67. To determine if page is indeed a review page
Use review vocabulary:
Lexicon with most commonly occurring words
within review pages – details in thesis
Idea: score a page based on # of review page words
67
]10[)(,
)(
)(
)(),(log)( 2
pirevS
normalizer
piS
pirevS
Vt twtiptcpirevS
rawrev
raw
68. ]10[)(,
)(
)(
)(),(log)( 2
pirevS
normalizer
piS
pirevS
Vt twtiptcpirevS
rawrev
raw
To determine if page is indeed a review page
Use review vocabulary:
Lexicon with most commonly occurring words
within review pages – details in thesis
Idea: score a page based on # of review page words
Raw review page
relevance score
Normalize to obtain
final review page
relevance score
68
t is a term in the
review vocabulary, Vc(t, pi) – freq. of t in page pi (tf).
wt(t) - importance
weighting of t in RV
Normalizer needed to
set proper thresholds
69. Explored 3 normalization options:
SiteMax (SM) : Max Srevraw(pi) amongst all pages
related to a particular site - Normalize based on site
density
EntityMax (EM) : Max Srevraw(pi) score amongst all
pages related to an entity - Normalize based on
entity popularity
EntityMax + GlobalMax (GM) or
SiteMax + GlobalMax (GM) :
▪ To help with cases where SM/EM are unreliable
69
70. To determine if page is about target entity
Based on similarity between a page URL & Entity
Query
Why it works?
Most review pages have highly descriptive URLs
Entity Query is a detailed description of entity
The more URL resembles query, more likely it is
relevant to target entity
Similarity measure: Jaccard Similarity
70
71. Steps proposed so far, can be implemented in a
variety of different ways
Our goal: make the crawling framework usable in
practice
71
72. 1. Efficiency:
Allow review collection for large number of entities
Task should terminate in reasonable time & accuracy
Problem happens when cannot access required
information quickly
▪ E.g. Repeated access to term frequencies of different pages
2. Rich Information Access (RIA):
Allow client to access info. beyond crawled pages
E.g. Get all review pages from top 10 popular sites for entity X
DB not suitable because you cannot naturally model
complex relationships and would yield in large joins
72
73. Heterogeneous graph data structure
Models complex relationships between
different components in a data collection
problem
73
74. Review Vocabulary
Current Query
Q
V
t1
t2
t3
t4
t5
tz
.
.
.
.
Term Nodes
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
E1
Entity Nodes
E2
Ek
Hampton Inn Champaign
I-Hotel Conference Center
Drury inn Champaign
t
t
t
u
u
u
c
c
c
Page Nodes
P2
P1
P3
P4
P5
P6
Pn
Site Nodes
S2
St
hotels.com
local.yahoo.com
S1
tripadvisor.com
t = title, u = url, c = content
Logical Nodes
Other
Logical Nodes
74
75. Review Vocabulary
Current Query
Q
V
Other
Logical Nodes
t1
t2
t3
t4
t5
tz
.
.
.
.
Term Nodes
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
E1
Entity Nodes
E2
Ek
Hampton Inn Champaign
I-Hotel Conference Center
Drury inn Champaign
u
u
u
c
c
c
Page Nodes
P2
P1
P3
P4
P5
P6
Pn
Site Nodes
S2
St
hotels.com
local.yahoo.com
S1
tripadvisor.com
t
t
t
t = title, u = url, c = content
Logical Nodes
75
List of entities on which
reviews are required
Based on set of CRPs
found for each entity
At the
core, made up of
terms
One node
per unique
term
76. Maintain one simple data structure:
Access to various statistics
▪ E.g TF of word in a page EdgeWT(content node term node)
Access to complex relationships and global information
Compact: can be an in memory data structure
Network can be persisted and accessed later
Client applications can use network to answer
interesting app. related questions
E.g. Get all review pages for entity X from top 10 popular sites
76
77. t1
t2
t3
t4
t5
tz
.
.
.
.
Term NodesPage Nodes
P2
P1
P3
P4
P5
P6
Pn
wt
wt
wt
wt
V
C
Content Node
(logical node)
tf
tf
tf
Review Vocabulary Node
(logical node)
To compute Srevraw(pi) :
-Terms present in both the Content node and RV node.
-TF and weights can be obtained from edges
-Lookup of review vocabulary words within a page is fast
-No need to parse page contents each time encountered
77
Outgoing edges = term ownership
Edge weight = importance wt
Edge weight = TF
78. Opinion Vocabulary
Current Query
Q
O
Other
Logical Nodes
t1
t2
t3
t4
t5
tz
.
.
.
.
Term Nodes
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
E1
Entity Node
E2
Ek
Hampton Inn Champaign
I-Hotel Conference Center
Drury inn Champaign
u
u
u
c
c
c
Page Nodes
P2
P1
P3
P4
P5
P6
Pn
Site Nodes
S2
St
hotels.com
local.yahoo.com
S1
tripadvisor.com
t
t
t
Logical Nodes
Access all pages connected
to the site node
requires complete graph
78
79. Opinion Vocabulary
Current Query
Q
O
Other
Logical Nodes
t1
t2
t3
t4
t5
tz
.
.
.
.
Term Nodes
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
wt
E1
Entity Node
E2
Ek
Hampton Inn Champaign
I-Hotel Conference Center
Drury inn Champaign
u
u
u
c
c
c
Page Nodes
P2
P1
P3
P4
P5
P6
Pn
Site Nodes
S2
St
hotels.com
local.yahoo.com
tripadvisor.com
S1
t
t
t
Logical Nodes
Access all pages connected
to entity node
requires complete graph
79
81. Goal: Evaluate accuracy & give insights into efficiency
using FetchGraph
Evaluated in 3 domains:
(5) – Electronics, (5) – Hotels, (4) - Attractions
Only 14 entities expensive to obtain judgments
Gold standard:
For each entity, explore top 50 Google results & links
around vicinity of the results (up to depth 3)
3 Human judges used to determine relevance of
collected links to entity query (crowd sourcing)
Final judgment: majority voting
81
82. Baseline: Google search results
Deemed relevant to entity query
Evaluation measure:
Precision
Recall – estimate of coverage of review pages
82
)Pages(eGoldStdRel#
)RelPages(e#
)Recall(e
k
k
k
)ages(eRetrievedP#
)RelPages(e#
)Prec(e
k
k
k
84. 0.00
0.05
0.10
0.15
0.20
0.25
10 20 30 40 50
Recall
Number of search results
Google OpinoFetch OpinoFetchUnnormalized
OpinoFetch
OpinoFetchUnnormalized
Google
Google: recall
consistently low
Google: recall
consistently low
Google: recall
consistently low
Google: recall
consistently low
84
Search results not always relevant to EQ or not
direct pointers to actual review pages.
85. 0.00
0.05
0.10
0.15
0.20
0.25
10 20 30 40 50
Recall
Number of search results
Google OpinoFetch OpinoFetchUnnormalized
OpinoFetch
OpinoFetchUnnormalized
Google
OpinoFetch: recall
keeps improving
OpinoFetch: recall
keeps improving
OpinoFetch: recall
keeps improving
OpinoFetch: recall
keeps improving
85
-A lot of relevant content in vicinity of search results
-OpinoFetch is able to discover such relevant content
86. 0.00
0.05
0.10
0.15
0.20
0.25
10 20 30 40 50
Recall
Number of search results
Google OpinoFetch OpinoFetchUnnormalized
OpinoFetch
OpinoFetchUnnormalized
Google
OpinoFetch: better
recall with normalization
-Scores are normalized using special normalizers
(e.g. EntityMax / SiteMax)
-Easier to distinguish relevant review pages
86
89. Avg. Execution Time with/without FetchGraph
With
FetchGraph
Without
FetchGraph
Srevraw(pi) 0.09ms 8.60ms
EnityMax
Normalizer
0.06ms 4.40 s
Without FetchGraph:
-Parse page contents each time
With FetchGraph:
-Page loaded into memory once
-Use FetchGraph to compute Srevraw(pi)
89
90. Avg. Execution Time with/without FetchGraph
With
FetchGraph
Without
FetchGraph
Srevraw(pi) ~0.09ms ~8.60ms
EnityMax
Normalizer
~0.06ms ~4.40s
Without FetchGraph:
load sets of pages into memory
to find entity max normalizer
With FetchGraph:
-Global info tracked till the end
-Only need to do a lookup on related sets
of pages to obtain entity max normalizer
90
91. Proposed: An unsupervised, practical method
for collecting reviews on arbitrary entities
Works with reasonable accuracy without
requiring large amounts of training data
Proposed FetchGraph:
Helps with efficient lookup of various statistics
Useful for answering application related queries
91
93. Finds & ranks entities based on user preferences
Unstructured opinion preferences - novel
Structured preferences - e.g. price, brand, etc.
Beyond search: Support for analysis of entities
Ability to generate textual summaries of reviews
Ability to display tag clouds of reviews
Current version: Works in the hotels domain
93
94. Search: Find entities based
on unstructured opinion
preferences
Search: + Combine with
structured preferences
Ranking: How well all
preferences are
matched?
94
97. Summary with Initial Reviews:
-26 reviews in total
-1-2 sources
Summary with OpinoFetch Reviews:
-135 reviews (8 sources)
-Extracted with a baseline extractor
-Not all reviews were included – filter
• Based on length of review
• Subjectivity score of review 97
98. Opinion Based Entity Ranking
Use click through & query logs to further improve
ranking of entities
▪ Now possible everything is logged by demo system
Look into the use of phrasal search for ranking
▪ Limit deviation from actual query (e.g. “close to university”)
▪ Explore: “back-off” style scoring – score based on phrase
then remove the phrase restriction
98
99. Opinosis
How to scale up to very large amounts of text?
▪ Explore use of map reduce framework
Would this approach work with other types of texts?
▪ E.g. Tweets, Facebook comments – shorter texts
Opinion Acquisition
Compare OpinoFetch with a supervised crawler
▪ Can achieve comparable results?
How to improve recall of OpinoFetch?
▪ To evaluate at a reasonable scale: approximate judgments
without relying on humans?
99
100. [Barzilay and Lee2003] Barzilay, Regina and Lillian Lee. 2003. Learning to paraphrase: an unsupervised
approach using multiple-sequence alignment. In NAACL ’03: Proceedings of the 2003 Conference of the
North American Chapter of the Association for Computational Linguistics on Human Language
Technology, pages 16–23, Morristown, NJ, USA.
[DeJong1982] DeJong, Gerald F. 1982. An overview of the FRUMP system. In Lehnert, Wendy G. and Martin H.
Ringle, editors, Strategies for Natural Language Processing, pages 149–176. Lawrence
Erlbaum, Hillsdale, NJ.
[Erkan and Radev2004] Erkan, G¨unes and Dragomir R. Radev. 2004. Lexrank: graph-based lexical centrality as
salience in text summarization. J. Artif. Int. Res.,22(1):457–479.
[Finley and Harabagiu2002] Finley, Sanda Harabagiu and Sanda M. Harabagiu. 2002. Generating single and
multi-document summaries with gistexter. In Proceedings of the workshop on automatic
summarization, pages 30–38.
[Hu and Liu2004] Hu, Minqing and Bing Liu. 2004. Mining and summarizing customer reviews. In KDD, pages
168–177.
[Jing and McKeown2000] Jing, Hongyan and Kathleen R. McKeown. 2000. Cut and paste based text
summarization. In Proceedings of the 1st North American chapter of the Association for Computational
Linguistics conference, pages 178–185, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
[Lerman et al.2009] Lerman, Kevin, Sasha Blair-Goldensohn, and Ryan Mcdonald. 2009. Sentiment
summarization: Evaluating and learning user preferences. In 12th Conference of the European Chapter of
the Association for Computational Linguistics (EACL-09).
[Mihalcea and Tarau2004] Mihalcea, R. and P. Tarau. 2004. TextRank: Bringing order into texts. In Proceedings
of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, July.
[Pang and Lee2004] Pang, Bo and Lillian Lee. 2004. A sentimental education: Sentiment analysis using
subjectivity summarization based on minimum cuts. In Proceedings of the ACL, pages 271–278.
[Pang et al.2002] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment
classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical
Methods in Natural Language Processing (EMNLP), pages 79–86.
[Radev and McKeown1998] Radev, DR and K. McKeown. 1998. Generating natural language summaries from
multiple on-line sources. Computational Linguistics, 24(3):469–500.
[More in Thesis Report] 100
Editor's Notes
I would like to thank all of you for being present for my talk despite the time differences.Today I will be presenting my thesis proposal. The title of my thesis is Opinion-Driven Decision Support System.
We use opinions on the web for all sorts of decision making tasks…For example….when visiting a new city…We use opinions to decide which hotel to stay at? Or which attractions to visit? Opinions are essential for decision making…for example if we are looking for a hotel in NYC, we often read many online opinions to figure out which one of these to stay at?Without these online opinions the decision making task becomes more difficult because youOnly have limited information to base your self on…you may only have price, description
We use opinions on the web for all sorts of decision making tasks…For example….when visiting a new city…We use opinions to decide which hotel to stay at? Or which attractions to visit? Without these online opinions the decision making task becomes more difficult because youOnly have limited information to base your self on…you may only have price, location description
Most of the existing work leveraging opinions have been focused on summarization of opinions to help Users better digest the opinions. This is mainly in the form of structured summaries such as +ve or –ve on a given piece of text (sentence, passage or a document) or a more fine-grained summary where you have sentiment ratings on different aspects….This alone is not sufficient because you need otherComponents to support effective decision making
We actually need to address a broader set of problems to effectively support such a decision making task.
First of all we large amounts of opinions. Since opinions are subjective andCan vary quite a bit, we need a large set of it to allow users to get the complete pictureThen we need different analysis tools, we can have sentiment trend visualization …which showsFluctuation in sentiments over time. Then we can have aspect level summaries, textual summaries and so on.
First of all we large amounts of opinions. Since opinions are subjective andCan vary quite a bit, we need a large set of it to allow users to get the complete pictureThen we need different analysis tools, we can have sentiment trend visualization …which showsFluctuation in sentiments over time. Then we can have aspect level summaries, textual summaries and so on.
Then, we also need to incorporate search so that users can actually find different itemsAnd entities utilizing existing opinions. This would actually improve user productivityBecause it cuts down on the time spent on reading a large number of opinions.
-finally, we also need to know how to present the opinions at hand effectively-so for example, if you have aspect level summaries, you need to understand howTo organize it, so you show scores or visuals like star rating…do you also need supporting phrases?If you have full opinions, then you need to think about how to allow effecitive browsing between one passage to the next….so as to not overwhelm users.
With this, I propose a framework called the ODSS…which encompasses data collection, analysis tools,Search and presentation. All of which to support a more complete decision making platform based on opinions. In my thesis I tried to solve some of the problems related to data collection, search capabilities and analysis….
The focus of the methods proposed in this thesis is (1) to make it very general…..This is so that the approach can work across various domains and content type (cars, electronics) and (news, legal docs). The second focus is to make these methods actually practical and lightweight…. This is so that it can be easily applied in practica and can scale up to large amounts of data
So first we will look at my search related work on OpinionBased Entity Ranking which was published in the IRJ.
So how do u go about solving this ranking problem?-this approach is usually not practicalSome of the existing methods rely on some form of supervision, such as the overall user ratings…but this kind of informationMay not always be available
So we propose to leverageSo in this paper what we did was to
First we will into some of the extensions that we explored…The first is modeling the aspects in the query…With the standard retrieval method…To improve this we look into scoring each preference separately… and then combine the results
First we will into some of the extensions that we explored…The first is modeling the aspects in the query…With the standard retrieval method…To improve this we look into scoring each preference separately… and then combine the results
In standard text retrieval models, matching an opinion word and a standard topic word is not distinguished.But, in this ranking task it is important to match opinion words in the user's queryHowever, opinion words generally have more variation than topic words….So the intuition is that by expanding a user query with additional equivalent opinion words, we can help in emphasizing the matching of opinion words.
So here are results….These are the results with the use of QAM & Opin Exp in two domains…hotels and cars.The blue bar is improvement with the use QAM over standard retrievalThe red bar is improvement with the use OpinExp with QAM over standard retrievalFirst you see that with both extensions, there is improvement with most cases andis especially clear with the use of OE.
Then you see that with just QAM any of the retrieval models can be used – because the improvements are not that different.
But when you pair OpinExp+QAM, it is clear that BM25 is most effective retrieval model. One reason for this is because BM25 does not over reward high frequency words. So an entity is not ranked highly because of matching just one of the words in the query.
Next, we will move to the analaysis part where I looked into abstractive summarization of opinions. For this, I have explored two different approaches.
Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
But the problem is, if you wanted to know more about each of these aspects, you would actually have to read many sentences, from thethousands of reviews, to have your questions answered.
For textual summaries to be useful to users we first require that the summary actually summarizes the major opinions, is concise so that its viewable on smaller screens and of course should be reasonably readable.
Miss out information during sentence selection
So this is how it works at a very high level, but there are many details to this that are outline in the paperSuch as stictch two different subgraphs into 1 and how we use positional information to find promising paths.
Opinosis-Graph has 3 unique properties that help generate abstractive summaries3 key concepts used in summarization algorithm
There are three unique properties of the OG that helps in Finding candidate summaure
There are three unique properties of the OG that helps in Finding candidate summaries.
If you look at the words drop and freq from sentence 2, even though the gap between the words is 2,Because there is already a link between drop and freq, you can actually leverage this to find more redundancies.Here you can see that even though the gap between the words is 2, because there is already a link betweendrop and frequently, you can leverage this link to find more redundancies.
Here you can see that even though the gap between the words is 2, because there is already a link betweendrop and frequently, you can leverage this link to find more redundancies.
Here are two new sentences, calls drop freq qith the iphone and black…This is the resulting opinosis graph. You can see that there is one high redundancy path, followed by high fan out. The node ‘the’ thus acts like a hub. Such a structure can actually be used to merge sentences to form one suchAs calls drop freq….This kind of a structure is easily discoverable using the Opinosis Graph.
Well formedaccording to the language's grammatical rules…And in this work we emphasize…on 3 diff
We try to capture these three criteria using the following optimzation framework. Objective function ensure that the summaries reflect opinions from the original text and is reasonably well formed.
We try to capture these three criteria using the following optimzation framework. Objective function ensure that the summaries reflect opinions from the original text and is reasonably well formed. Objective function here tried to optimize rep and readability scores. So it tries ensure that the summaries reflect opinions from the original text and is reasonably well formed. Srep(mi) is the rep score and Sread(mi) is the read score…. And mi represents a micropinion
Srep(mi) is the representativeness score of mi and mi is a micropinion summarySread(mi) is the readability score of mi
The first threshold constraint that controls the maximum length of the summary, captures compactness.The constraint that controls the similarity between phrases, also captures compactness by minimizing redundancy. Both these thresholds are user adjustable. If the user can tolerate more redundancues
This constraint controls the similarity between phrases which is a user adjustable parameters.Italso captures compactness by minimizing the amount redundancies. Both these thresholds are user adjustable. If the user can tolerate more redundancues
To score similarity between phrases, we simply use the jaccard similarity measure
Then the next is representativeness…Then in scoring representativeness, we have defined two properties of highly representative phrases:The words in each phrase should be strongly associated within a narrow window in the original textThe words in each phrase should be sufficiently frequent in the original textThese two properties is capture by a modified pointwise mutual information functionThen to scoreThe intuition is that if a generated phrase occurs frequently onthe web, then this phrase is readable.
Readability scoring is to determine how well formed the constructed phrases are.Since the phrases are constructed from seed words, we can have new phrases that may not have occurredin the original text.
Moving into the evaluation part….
-This graph shows the rouge scores of thedifferent summarization methods for different summary lengths.-We have KEA is a supervised keyphrase extraction model-We have a simple tfidf based keyphrase extraction method-Then we have Opinosis previously show-For this task, WebNgram performs the bestOpinosis does not perform as well most likely because there is a lack of structural redundancies within the full revieews of CNET.
-This graph shows the rouge scores of thedifferent summarization methods for different summary lengths.-We have KEA is a supervised keyphrase extraction model-We have a simple tfidf based keyphrase extraction method-Then we have Opinosis previously show-For this task, WebNgram performs the bestOpinosis does not perform as well most likely because there is a lack of structural redundancies within the full revieews of CNET.
Now we will change gears and move into the new work that I have done for the data collection part.Which is:….This work is to be submitted to an upcoming conference
So even tho we have an abundance of existing opinions, there is no direct way of finding entities of interest based on opinions.
Since user reviews alone make up a big portion of online opinions, I would like to narrow the focus to the crawling online reviews.The goal this task would be to provide an efficient method for crawling reviews and I would like to do this by focusing on the discovery of review rich seed pages for a given entity by exploiting…trying to intelligentsly discover a set of review rich pages for a given entity and this would act as seeds to the actually crawler. NEW------The goal of this task is to….And I plan to do this by focusing on….I focus on reviews because it represents a big portio…
Goal of OpinoFetch is to be general enough to work across domains
The input is basically a set of entities on which reviews are required.For e.g. all hotels in a particular city.Then for each entity, we find a set of initial candidate review pages, referred to as CRPs. ThisIs done using a general web search engine such as bing or google where the top n resultsWill server as the initial CRPS.
The input is basically a set of entities on which reviews are required.For e.g. all hotels in a particular city.Then for each entity, we find a set of initial candidate review pages, referred to as CRPs. ThisIs done using a general web search engine such as bing or google where the top n resultsWill server as the initial CRPS.
Then we expand the CRP list by exploring links in the neighborhood of the initial CRPs until a depth limit is met. This is actually to obtain more potential CRPs. Finally, to collection relevant pages, each CRP is scored in terms of entity relevance and review page relevance and only those that satisfy the min thresholds are retained.
The first step is to find initial candidate review pages. This is done on a per entity basis using a general web search engine like Bing or google. The query format used is the entity name followed by the brand or address and followed by the word reviews.So in this case…..The intuition for this is that search engines, already index most of the web, so we can leverage this to dig out the relevant review pages instead of trying to crawl the entire web
The next step is to find more CRPs. This is done by exploring links around the initialCRP. There are different link exploration approaches that have been proposed but this is notour focus. In OpinoFetch, we use URL prioritization where we follow the top N URLs in a given page.We stop when a certain depth is reached. The prioritization strategy used is to bias the crawl path towardsEntity related pages and this is achieved using a priority score assigned to URLs and this is based on the average cosine distance between the EQ and the URL Tokens and EQ and the Anchor Tokens. The intuition is that the more the anchor and url resemble the EQ the more likely the page is relevant to target entity.I try to expand the list of candidate pages through URL prioritizationWhere you follow the top n urls in any given page until a certain depth is reached. There has been many different approaches proposed in terms of url prioritizationSo this is not really my focus. The strategy that I used is to try to bias the crawl path towards entity related pages…..And to achieve this, the cosine distance is measured between the url tokens and the entity queryAchor text and the EQ; Where the scores are then averaged. So with this, the more the URL and anchor text resemble the EQ the more likely that the page is about the target entity.
So for review page relevance scoring, we use a rv. Which is a lexicon consisting of the most commonly occuring words within a review page. The details on how this lexicon is constructred is outlined in the full thesis. The idea is to score a page based on the # of review page words occurring in the page. The intuition is that, if a page has many of the review vocab terms, then this page is likely a review page. To determine review page relevance you can score a page based on the occurences of specific words. In this work I used a review vocabulary which is a lexicon consisting of commonly occurring words within review pages and each word is weighted by importance(details in thesis). The details on how this review vocab is constructed can be found in my thesis. So this is the scoring formula
-t is a term in rv-c(t,pi) is the freq of t in page pi . Log is used to scale down the tf, otherwise one very frequently occurring word can dominate-wt(t) is the importance of t in rv, so the more important t is the higher the weighting-Because the numerator is the sum of weights, this value can become quite large for highly dense review pages. So we need to normalize it to be able to adjust the score threshold.
It is unclear what the best normalizer would be for the raw rev rel score.So we explored 3 options.This first is called SiteMaxnormalizer where we use the max rev rel score amongst pagesFrom a given site. The intuition is that if a site is densely populated with reviews then the rev rel score will be high. So non review pages can get eliminated easily because…If entity is highly popular, there will be many more reviews on that entity resulting in higher Srev(pi) scoreIf an entity is not so popular, then the amount of reviews will be sparse resulting in lower Srev(pi) score A review page of an unpopular entity would still receive a high score because the maximum Srev(pi) score would not be high, so thenormalizer gets adjusted according to entity popularity. The third normalizer is to combine entity max or sitemax with the global maximum. This is to help with cases where EM or SM are unreliable.
Most review pages, have URLs that are highly descriptive.For example: this is the URL for iPhone reviews on amazonAs you can see the name of the item is within the url itself.This is the URL for reviews on Hampton Inn on Tripadvisor.Again you see the name of the hotel and city within the URL.With this, the entity relevance scoring is based on how similar the entity query is with the page URL. The intuition is that the more the URL resembles the query the more likely it is relevant to the target entity.
There are number of ways in which we can solve the proposed steps that I just described.Howevver, if we want to server real applications? We need to think about what will make the approachUseful in practice and useful to client applications.
First of all, we need an approach that is efficient, because our goal is to collect allow review collections for a large number of entities. So the task should terminate in reasonable time with reasonable accuracy. The problem usually happens when we cannot access required information fast. For example in computing term frequencies within a page for relevance scoring. The second thing that would be useful is access to rich information to enable client applications to obtain information beyond just the list of crawled pages. For example: to get all review pages for entity X from the top 10 popular sites. With current methods it would be difficcult get such info. So we need a rich information representation to deduce such information. Databases are not suitable for this because you will be dealing with complex joins and it does not naturally model complex info. The webgraph does not model complex relationships.
So, this is an example of a fetch graph. As you can see, each component is represented by a simple node in the graph and relationships are modeled using edges.You have entity nodes, site nodes, page nodes, term nodes and logical nodes which are conceptual nodes.
Entity nodes, represents entity on which reviews are needed; and you can have a very large set of entities (e.g. all hotels in the US). Here you only have three. Then entity nodes belong to a set of page nodes. This relationship is based on the set of CRPs found for eachIf p1 p2 and p3 appeared as the CRP for E1, then there would be edges from E1 to p1, p2 and p3. Then the pages themselves can be related to one another. For example, if you have a page that is a near duplicate to the other, the pages can be linked. Or if one page is the parent of another you can model this relationship.Next, each page can be made up of several components….such as the title, the URL and the textual contents. This can be modeled using logical nodes which are simply nodes to represent different concepts. Each of these components are made up of terms, and this is modeled using relation ship with term nodes. Each term node represents a unique word in entire vocabulary. So you would only have one node per term. The edges to the term nodes can hold term weighting information. Then you can have other logical nodes. In OpinoFetch, we have the query and an opinion vocabulary. The contents of the query and the opinion vocabulary are captured using relationship with term nodes.
TheFetchGraph has many uses. First of all,It serves as a simple data representation structure. You don’t need separate index for the terms and all other components.Then you can access all sorts of information using this 1 structure. You can get various stats. For e.g. you can easily getthe tf of a word in a given page. You just need to read the edge wt connecting the content node to the relevant term node. And then you can also access complex relationships and global information. This is because the relationship between The different components is tracked over the course of data collection. Also, since this structure hasThe potential of being compact, it can be easily made an in memory data structure. Since the network itself can be persisted and accessed at a later time, the client application can actually use this to answer interesting and important app related questions. For e.g. now with this structure the client can actually obtain all review pages for a given entity from top 10 pop sites. For example ou have access to
So now we will look at how to obtain statistics needed from the FG.-It is very easy to compute raw rev rel score using the FetchGraph-Using the fetchgraph, the Review Vocabulary is modeled using a logical where all outgoing edges to term nodes represent the terms that are part of the vocabulary. The wt on the edges represent the Importance of each term in the rv. -So to compute the raw review relevance score, we need to look at the content node of a page, and seeWhich of the terms appear both in the content and RV. -With this you do not need to parse page contents each time it is encountered and the lookup of review vocabulary words within a page is fast
To compute the similarity between the URL and the EQ, you need to access the URL node of a page and theCorresponding query node. The union of terms can be obtained by looking at all term nodes connected to the url or query node. For the intersection you look at term nodes that both are connected to.
Gold standard is valid URLs for a given entity around the vicinity of the search results. Precision is computed as the number of relevant review pages, divided by the total number of retrieved pages for the given entity.Recall is computed is the #of relevant pages drivided by the number of relevant pages according to the gold standard
This graph shows the recall achieved by google, OpinoFetch and the unnormalized version of OpinoFetch at different search results sizes. We see that the recall achieved by google is consistently low even with increasing num of results. This shows a lot of the search results are not necessarily relevant to the EQ or are not direct pointers to review pages. much lower than OpinoFetch . ThisGoes to show that there are many more relevant review pages around the vicinity of the searchResults than what the search engine deems as relevant.
Then we see that with OpinoFetch, with increasing number of search results, the recallActually improves. This shows that is actually a lot of relevant content around the vicinity of the search results and OpinoFetch is able to discover such relevant content. The actuall recall is higher if you account for near duplicate pages (where you get penalized if you don’t find all versions of the page),
Next, we see that by normalizing the raw review relevance score, the recall is actually better than without normalization. One reason for this is because the scores are normalized using special normalizers like EM or SM , in this case EM. So the scores are normalizedaccording to entity popularity. So its easier to identify truly relevant review pages from irrelevant ones.
The next question is, what is the best way to normalize review relevance score. This graph shows the average change in precision over not pruning the crawled pages using different normalizers. Here we can see that EM+GM gives the best precision and SM gives the lowest precision. SM is worst performing most probably because certain sites like Tripadvisor, have reviews on different types of entities like attractions, hotels, and so on. So using the max score from the site may be unreliable for sparse entities like attractions.
Next, we look at the growth of the FG. Since we use a single network to track all information,it may seem that the FG will grow too large too fast. This graph shows the growth Of the FG, with respect to # pages crawled. It is clear that the FetchGraph'sgrowth is actually linear to number of pages collected and this is actuallyWithout any form of optimization or compression. The growth can actually be Further contained with more optimizations. So this shows that it is possibleto have this as an in memds for different data collection problems.
Now we will look into the improvement in efficiency using the FetchGraph. The first row shows the average time to compute the raw relevance score using the FetchGraph and without it.As you can see, it takes about 0.085ms using the FG and without the use of the FG it takes 8.62 ms.Without the FG, you need to load a page into memory each time and then parse the page and thenDo the score computation. Even if the page was already previously encountered, you wouldstill have to load it and parse it to compute the scores. With the FG, a page is loaded into memory only onceAnd then the connections with the RV are immediately established so from then on its very straightforwardTo compute the Srevraw using the FG
Then to compute the EM normalizer, it takes 0.056ms using the FG and 4.39s without it. This is because without any ds, you basically would have toload sets of pages back into memory to find the entity max score normalizer. With the FG however, you track global info, so you just needTo do a lookup on the related sets of pages and obtain the max scores from that.
Now I will talk a little about the web demo system that I have developed called Findilike which integrates some of the ideas so far from this thesis. Demo was shown in WWW 2012
Findilike, find and ranks entities based on a set of user prefsThis can be unstructured opinion preferneces – which is the unique partand also structured preferences such as price brand and so on.Beyond search, it supports analysis of entities in termsOf textual review summaries and also tag cloud visualization of reviews.The current version actually works in the hotels domain.
So this is the interface of findilike. This is where u specify the unstructured prefs.In this case it is a search for clean hotels.This is where u specify the structured prefs such as distance from a particular location.In this case, the location specified is universal studios in LA.This is the ranking of hotels based on the prefenreces specified. So you hv opinion prefs as well as distance.
This shows the tag clouds of reviews.
This shows the textual review summaries. Can see that the summaries are farly well formed.
I have also updated part of the demo with reviews crawled using the OpinoFetch method.Here is an example. This summary is for Hampton inn in Champaign. This isBased on the initial reviews crawled from 1 or 2 sources. There are total Of 26 reviews . Now with the reviews crawled using OpinoFetch, I obtainedabout 135 reviews after doing some filtering and this was based on 8 sources.I actually wrote a baseline review extractor to extract individual reviews. The reviews selected were based on the length of the reviews (not too short) and Also a subjectivity score.
In terms of Future Work, with Opinosis, I would like to look into how to scale up the approach to really large amounts of text? I want to explore the use of the map reduce framework for this. I also would like to see how the approach works on other types of text, such as tweets, facebook comments, news articles and patient health records.Then for the work of OBER, I would like to see how to use query logs and click info to further improve ranking of entities. This is now possible because everything is logged in the demo system. I would also like to look into the use of phrasal search for ranking. I have not had much success with phrase search and I would like to understand why. Perhaps I would have to try a back-off type of approach where the scoring of entities is first based on the phrase, and then without the phrase restriction.
In terms of Future Work, with Opinosis, I would like to look into how to scale up the approach to really large amounts of text? I want to explore the use of the map reduce framework for this. I also would like to see how the approach works on other types of text, such as tweets, facebook comments, news articles and patient health records.And then for the work on opinion acquisition I would like to compare the proposed method with an unsupervised one and then also see how to further improve recall using just web search engines. To do this at a reasonable scale, I actually need to think about how to approximate judgments without relying completely on human judges. Then for the work of OBER, I would like to see how to use query logs and click info to further improve ranking of entities. This is now possible because everything is logged in the demo system. I would also like to look into the use of phrasal search for ranking. I have not had much success with phrase search and I would like to understand why. Perhaps I would have to try a back-off type of approach where the scoring of entities is first based on the phrase, and then without the phrase restriction.