Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions, WWW '12
This work presents a new unsupervised approach to generating ultra-concise summaries of opinions focusing on three aspects: compactness, representativeness and readability.
By: Kavita Ganesan
This was presented at the 2014 IEEE Conference on Big Data
For details:
http://kavita-ganesan.com/content/general-supervised-approach-segmentation-clinical-texts
Citation:
Ganesan, Kavita, and Michael Subotin. "A General Supervised Approach to Segmentation of Clinical Texts."
A brief description of the Opinion-Based Entity Ranking paper published in the Information Retrieval Journal, Volume 15, Number 2, 2012.
Slides By Kavita Ganesan.
This was presented at the 2014 IEEE Conference on Big Data
For details:
http://kavita-ganesan.com/content/general-supervised-approach-segmentation-clinical-texts
Citation:
Ganesan, Kavita, and Michael Subotin. "A General Supervised Approach to Segmentation of Clinical Texts."
A brief description of the Opinion-Based Entity Ranking paper published in the Information Retrieval Journal, Volume 15, Number 2, 2012.
Slides By Kavita Ganesan.
Ai for games seminar: N-Grams prediction + intro to bayes inferenceAndrea Tucci
This is an assignment done in the university of abertay dundee, for the Artificial Intelligence for Games course. My topics were N-Grams and Bayes Inference. The slides should give a basic introduction to N-Grams predictiors and Bayes Inference, as well as Bayes Networks.
This talk was given at the Sentiment Analysis Innovation Summit on how to leverage large amounts of opinions to help users make decisions. Topics include methods to abstract out opinions, opinion-driven search engine and how FindiLike Hotel Search uses some of the state-of-the-art opinion-driven decision making tools.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Ai for games seminar: N-Grams prediction + intro to bayes inferenceAndrea Tucci
This is an assignment done in the university of abertay dundee, for the Artificial Intelligence for Games course. My topics were N-Grams and Bayes Inference. The slides should give a basic introduction to N-Grams predictiors and Bayes Inference, as well as Bayes Networks.
This talk was given at the Sentiment Analysis Innovation Summit on how to leverage large amounts of opinions to help users make decisions. Topics include methods to abstract out opinions, opinion-driven search engine and how FindiLike Hotel Search uses some of the state-of-the-art opinion-driven decision making tools.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
4. Opinion Summary for iPod Touch
To know more: read many
redundant sentences
Structured summaries useful, but
insufficient!
5. Textual opinion summaries:
• big and clear screen
• battery lasts long without wifi
• great selection of apps
………………
Textual summaries can provide more
information…
6. Represent the major opinions
Key complaints/praise in text
Important critical information
Readable/well formed
Easily understood by readers
Compact/Concise
Can be viewed on all screen sizes
▪ e.g. phone, pda, tablets, dekstops
Maximize information conveyed
7. Generate a set of non-redundant phrases:
Summarizing key opinions in text
Short (2-5 words) Micropinions
Readable
Micropinion summary for a restaurant:
“Good service”
“Delicious soup dishes”
Ultra-concise nature of phrases to allow flexible
adjustment of summaries according to display
constraints
8.
9. Has been widely studied
[Radev et al.2000; Erkan & Radev, 2004; Mihalcea & Tarau, 2004…]
Fairly easy to implement
Problem: Not suitable for concise summaries
Bias – with limit on summary size
▪ selected sentence/phrase may have missed critical info
Verbose - may contain irrelevant information
▪ not suitable for smaller devices
10. Understand the original text and “re-tell” story in a
fewer words Hard to achieve!
Some methods require manual effort
[DeJong 1982; Radev & McKeown 1998; Finley & Harabagiu 2002]
Need to define templates
Later filled with info using IE techniques
Some methods rely on deep NL understanding
[Saggion & Lapalme 2002; Jing & McKeown 2000]
Domain dependent
Impractical – high computational costs
11. Goal is to extract important phrases from text
Traditionally used to characterize documents
Can be potentially used to select key opinion phrases
Closest to our goal!
Problem:
Only topic phrases may be selected
▪ E.g. battery life, screen size candidate phrases
▪ Enough to characterize documents, but we need more info!
Readability aspect is not much of a concern
▪ E.g. “Battery short life” == “Short battery life”
Most methods are supervised – need training data
[Tomokiyo & Hurst 2003; Witten et al. 1999; Medelyan & Witten 2008]
12. Unsupervised, lightweight & general approach to
generating ultra-concise summaries of opinions
Idea is to use existing words in original text to
compose meaningful summaries
Emphasis on 3 aspects:
Compactness
▪ summaries should use as few words as possible
Representativeness
▪ summaries should reflect major opinions in text
Readability
▪ summaries should be fairly well formed
13. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
subject to
k
mi ss
i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
14. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
Micropinion Summary, M subject to
2.3 very clean rooms
2.1 friendly service k
1.8 dirty lobby and pool mi ss
1.3 nice and polite staff i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
15. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
Micropinion Summary, M subject to
m1 2.3 very clean rooms
m2 2.1 friendly service k
m3 1.8 dirty lobby and pool mi ss
mk 1.3 nice and polite staff
i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
16. Representativeness score of mi
k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
subject to Readability score of mi
k
mi ss
i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
17. Objective function
k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
subject to
k
mi ss
i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
18. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
constraints
subject to
k
mi ss
i 1
Srep (mi) rep
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
19. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
Objective function: Optimize
subject to m
1 2.3 very clean rooms
representativeness & k m2 2.1 friendly service
readability scores mi m
ss 3 1.8 dirty lobby and pool
Ensure: summaries reflect key opinions & mk 1.3 nice and polite staff
i 1
reasonably well formed
S (m ) rep i rep Objective function value
Sread (mi) read
sim( mi , mj ) sim i, j 1, k
20. k
M arg max mi...mk Srep (mi) Sread (mi)
Constraint 1: Maximum
i 1
length of summary.
subject to •User adjustable
k •Captures compactness.
mi ss
i 1
Srep (mi) rep
Sread (mi) read
sim(mi , mj ) sim i, j 1, k
21. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1Constraint 2 & 3: Min
representativeness &
subject to
readability.
k •Helps improve efficiency
mi •Does not affect performance
ss
i 1
Srep (mi) rep
Sread (mi) read
sim(mi , mj ) sim i, j 1, k
22. k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
subject to
k
mi Constraint 4: Max
ss
i 1 similarity between phrases
Srep (mi) • User adjustable
rep
• Captures compactness by
Sread (mi) minimizing redundancies
read
sim(mi , mj ) sim i, j 1, k
23. Ensures representativeness &
readability k
M arg max mi...mk Srep (mi) Sread (mi)
i 1
subject to Capture compactness
k
mi ss
i 1
Srep (mi) rep
Sread (mi) Capture compactness
read
sim( mi , mj ) sim i, j 1, k
24. Now, we need to know how to compute:
Similarity scores: sim(mi, mj) ?
Readability scores: Sread(mi, mj) ?
Representativeness scores: Srep(mi, mj) ?
25. Score similarity between 2 phrases
User adjustable parameter
Why important?
Allows user to control redundancy
E.g. With small devices users may desire summaries with
good coverage of information less redundancies !
Measure used:
Standard Jaccard Similarity Measure
Can use other measures (e.g. cosine) – not our focus
26. Purpose: Measure how well a phrase
represents opinions from the original text?
2 properties of a highly representative phrase:
1. Words should be strongly associated in text
2. Words should be sufficiently frequent in text
Captured by a modified pointwise mutual
information (PMI) function
27. Captures property 1:
Measures strength of
association between
Original PMI function, pmi(wi,wj)
words
p( wi, wj )
pmi ( wi , wj ) log 2
p( wi ) p( wj )
28. Original PMI function, pmi(wi,wj)
p( wi, wj )
pmi ( wi , wj ) log 2 Captures property 2:
p( wi ) p( wj ) Rewards well
associated words with
Modified PMI function, high co-occurrences
pmi’(wi,wj)
Add frequency of
p( wi, wj ) c( wi, wj ) occurrence within
pmi ' ( wi , wj ) log2
p( wi ) p( wj ) a window
29. To compute representativeness of a phrase:
n Take average strength of
1
Srep (w1.. wn) association (pmi’) of each
pmilocal ( wi )
n i 1 word, wi in phrase with C
neighboring words
i C
1
pmilocal ( wi ) [ pmi ' ( wi, wj )]
2C j i C
30. Gives a good estimate of
how strongly associated
To compute representativeness of
the words are in a phrase a phrase:
n
1
Srep (w1.. wn) pmilocal ( wi )
n i 1
i C
1
pmilocal ( wi ) [ pmi ' ( wi, wj )]
2C j i C
31. Purpose: Measure well-formedness of phrases
Phrases are constructed from seed words
▪ can have new phrases not in original text
No guarantee phrases would be well-formed
chain rule to compute
joint probability in terms of
Our readability scoring: conditional probabilities
(averaged)
Based on Microsoft's Web N-gram model (publicly available)
N-gram model used to obtain conditional probabilities of phrases
n
1
S read ( wk ... wn ) log 2 p ( wk |wk q 1... wk )
1
K k q
Intuition: A phrase is more readable if it occurs more frequently on
the web
32. Example: Readability scores of phrases using tri-gram LM
Ungrammatical Grammatical
“sucks life battery” -4.51 “battery life sucks” -2.93
“life battery is poor” -3.66 “battery life is poor” -2.37
33. Scoring each candidate phrase is not practical!
Why?
Phrases composed using words from original text
Potential solution space would be Huge!
Our solution: greedy summarization algorithm
Explore solution space with heuristic pruning
Touch only the most promising candidates
34. Input Unigrams Seed Bigrams
….
very very + nice
nice very + clean
very + dirty
place
clean + place Srep > σ
clean clean + room rep
Text to be problem
dirty
dirty + place …
summarized
room …
Step 1: Shortlist Step 2: Form seed
high freq unigrams bigrams by pairing
(count > median) unigrams. Shortlist
by Srep. (Srep > σrep)
35. Higher order n-grams Summary
Candidates + Seed Bi-grams = New Candidates
clean rooms = very clean rooms 0.9 very clean rooms
very clean + clean bed very clean bed 0.8 friendly service
dirty room very dirty room 0.7 dirty lobby and pool
very dirty + = 0.5 nice and polite staff
dirty pool very dirty pool
…..
very nice + nice place = very nice place …..
nice room very nice room
Sorted Candidates
Srep<σrep ; Sread<σread
Step 3: Generate higher order n-grams. Step 4: Final summary.
• Concatenate existing candidates + seed bigrams Sort by objective
• Prune non-promising candidates (Srep & Sread) function value. Add
• Eliminate redundancies (sim(mi,mj)) phrases until |M|< σss
• Repeat process on shortlisted candidates
(until no possbility of expansion)
36.
37. Product reviews from CNET (330 products)
Each product has minimum of 5 reviews
CNET REVIEW
xxxxxx
PROS
Content Summarized
CONS
FULL REVIEW
38. Given
Textual reviews about a product
Constraints
Micropinion Summary, M
m1 2.3 easy to use summary size, σss
Task m2 2.1 lense is not clear redundancy, σsim
m3 1.8 too big for pocket
mk 1.3 expensive batteries representativeness, σrep
readability , σread
argmax Srep(mi) + Sread(mi)
39. Human composed summaries
Two human summarizers
▪ Each summarize 165 product reviews (total 330)
Top 10 phrases from pros & cons provided as hints
▪ Hints help with topic coverage reduce bias
Summarizers asked to compose a set of short
phrases (2-5 words) summarizing key opinions on
the product
40. 3 representative baselines:
Tf-Idf – unsupervised method commonly used for key phrase
extraction tasks
▪ Selected only adjective containing n-grams (performance reasons)
▪ Redundancy removal used to generate non-redundant phrases
KEA – state of the art supervised key phrase extraction
model [witten et al. 1999]
▪ Uses a Naive Bayes model
▪ Trained using 100 review documents withheld from dataset
Opinosis – unsupervised abstractive summarizer designed to
generate textual opinion summaries [ganesan et al. 2010]
▪ Shown to be effective in generating concise opinion summaries
▪ Designed for highly redundant text
41. ROUGE - to determine quality of summary
(ROUGE-1 & ROUGE-2)
Standard measure for summarization tasks
Measures precision & recall of overlapping units
between computer generated summary & human
summaries
42. Questionnaire – to assess potential utility of generated
summaries to users
Answered by 2 assessors
Original reviews provided as reference
Questionnaire
[1] Very Poor - [5] Very Good
Grammaticality Non-redundancy Informativeness
[DUC 2005] [DUC 2005] [Filippova 2010]
Do the phrases
Are the phrases Are the phrases in convey important
readable? the summary information about
unique? the product?
44. WebNgram: Performs
0.09 the best for this task
0.08
0.07 Opinosis: much better
ROUGE-2 RECALL
0.06 than KEA & tfidf
0.05 KEA: slightly KEA
0.04 better than tfidf Worst
Tfidf: Tfidf
0.03 performance
Opinosis
0.02 WebNGram
0.01
0.00
5 10 15 20 25 30
Summary Size (max words)
45. Intuition: Well-formed phrases tend to be
longer in general
very clear screen vs. very clear
good battery life vs. good battery
screen is bright vs. screen is
A few longer phrases is more desirable than
many fragmented (i.e. short) phrases
46. KEA: Generates most # of
12 phrases (i.e. favors short
11.75
phrases)
11 WebNGram: Generates
fewest phrases on average.
10
Average # of Phrases
9.86 (each phrase is longer)
9 WebNgram
8.74
8 Tfidf
7.36 Opinosis
7
Kea
6 6.04
5
4.53
4 WebNGram phrases are generally more
15 well-formed 30
25
Summary Size (max words)
47. In our algorithm: n-grams are generated from
seed words
potential of forming new phrases not in original text
Why not use existing n-grams?
With redundant opinions, using seen n-grams may
be sufficient
Performed a run by forcing only seen n-grams to
appear as candidate phrases.
49. Example:
Unseen N-Gram (Acer AL2216 Monitor)
“wide screen lcd monitor is bright”
readability : -1.88
representativeness: 4.25
“…plus the monitor is very bright…” Related
“…it is a wide screen, great color, great quality…” snippets in
“…this lcd monitor is quite bright and clear…” original text
50. Purpose of σrep & σread:
Control minimum representativeness & readability
Helps prune non-promising candidates improves
efficiency of algorithm
Without σrep & σread - we would still arrive at a
solution, however:
Time to convergence would be much longer
Results could be skewed
These parameters need to be set correctly!
51. ROUGE-2 scores at ROUGE-2 scores at
various σread settings various σrep settings
0.06 0.040
0.04
ROUGE-2
ROUGE-2
0.035
0.02
0.00 0.030
-4 -3 -2
σread -1 1 2 3
σrep 4 5
Performance is stable except in extreme
conditions - thresholds are too restrictive
52. ROUGE-2 scores at ROUGE-2 scores at
various σread settings various σrep settings
0.06 0.040
0.04
ROUGE-2
ROUGE-2
0.035
0.02
0.00 0.030
-4 -3 -2
σread -1 1 2 3
σrep 4 5
Ideal setting:
σread : between -2 and -4
σrep : between 1 and 4
54. Questionnaire [Score 1-5]
Score < 3 poor
Score > 3 good
Grammaticality Non-redundancy Informativeness
[DUC 2005] [DUC 2005] [Filippova 2010]
WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5
TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5
Human 4.7/5 Human 4.5/5 Human 3.6/5
TFIDF: Poor scores on all 3 aspects
(all below 3.0)
55. Questionnaire [Score 1-5]
Score < 3 poor
Score > 3 good
Grammaticality Non-redundancy Informativeness
[DUC 2005] [DUC 2005] [Filippova 2010]
WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5
TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5
Human 4.7/5 Human 4.5/5 Human 3.6/5
WebNGram: All scores above 3.0
Close to human scores
56. Questionnaire [Score 1-5]
Score < 3 poor
Score > 3 good
Grammaticality Non-redundancy Informativeness
[DUC 2005] [DUC 2005] [Filippova 2010]
WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5
TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5
Human 4.7/5 Human 4.5/5 Human 3.6/5
Subjective aspect. Human
summaries scored slightly
better than WebNgram
57. Semantic redundancies
“Good sound quality” ≠ “Excellent audio” semantically similar
Due to directly using surface words
Solution: opinion/phrase normalization before
summarizing
Some phrases are not opinion phrases
“I bought this for Christmas” grammatical & representative
Due to our candidate selection strategy
▪ Plus Side: Very general approach, can summarize any text
▪ Down Side: Summaries may be too general
Solution: stricter selection of phrases
▪ E.g. Select only opinion containing phrases
58. Canon Powershot SX120 IS
Easy to use
Good picture quality
Crisp and clear
Good video quality
Useful for pushing opinions
to devices where the screen
is small
E-reader/ Smart Cell Phones
Tablet Phones
59. Optimization framework to generate ultra-concise
summaries of opinions
Emphasis: representativeness, readability & compactness
Evaluation shows our summaries are:
Well-formed and convey essential information
More effective than other competing methods
Our approach is unsupervised, lightweight &
general
Can summarize any other textual content
(e.g. news articles, tweets, user comments, etc.)
Editor's Notes
Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
The problem is that non-informative phrases may be selected as candidate phrases. Phrases such as battery life and screen size may become candidate phrases since traditionally it is used only to characterize or tag documents.This is however non informative for an opinion summary.
We try to capture these three criteria using the following optimzation framework.
M represents a micropinion summary consisting of a set of short phrases
Mi to mk represents each short phrase in the summary
Srep(mi) is the representativeness score of miSread(mi) is the readability score of mi
This equation is the objective function
These are the constraints of the optimization framework
Objective function ensure that the summaries reflect opinions from the original text and is reasonably well formed. This is an example of the objective function values
-The first threshold constraint that controls the maximum length of the summary.-This is a user adjustable parameter and helps captures compactness.
This constraint controls the similarity between phrases which is a user adjustable parameters.Italso captures compactness by minimizing the amount redundancies.
To score similarity between phrases, we simply use the jaccard similarity measure
Then the next is representativeness…Then in scoring representativeness, we have defined two properties of highly representative phrases:The words in each phrase should be strongly associated within a narrow window in the original textThe words in each phrase should be sufficiently frequent in the original textThese two properties is capture by a modified pointwise mutual information functionThen to scoreThe intuition is that if a generated phrase occurs frequently onthe web, then this phrase is readable.
This is the original PMI formula. It captures property 1 by measuring the strength of association between two words.
This is the modified PMI formula. We add the frequency of occurrence of words within a narrow window.This would capture property 2 by rewarding highly co-occuring word pairs. This also addresses the problem ofsparseness, where low frequency words tend to yield in high PMI scores.
So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
Readability scoring is to determine how well formed the constructed phrases are.Since the phrases are constructed from seed words, we can have new phrases that may not have occurredin the original text.
Now lets look at some examples of readability scores, using the tri-gram lm. Let us say we want to summarize some facts about the battery life….
Now we have all the scoring components that we need for the optimization frameworkBut we cannot score each candidate phrase because we are composing phrases using words from the original text….the potential solution space can be huge. Our solution is to use a greedy algorithm where we explore the solution with some heuristics pruning so that we only touch the most promising candidates.
From the input text to be summarized, which is basically..some review texts,We shortlist a set of high freq unigrams where the count is larger than the median count (not counting 1’s). This acts as an initial reduction in search space. Then from these unigrams, we generate seed bigrams by pairing one unigram wiith every other unigram. We compute representativeness of the seed bigrams and keep only those that meet the minimum rep requirement. Pairunigram with every other unigram. Compute the representativeness scores and keep only those where the Srep >σrepThe next step is trying to grow the seed bigrams into higher order promising n-grams. This is done by concatenating existing candidates with bigrams that share an overlapping word. At any point, if the representativeness or readability scores are below the corresponding thresholds, these n-grams are pruned.At this point wel aslo check redundancy of the generated candidates, and if two phrases have a similarity higher than the sim threshold, we would discard the one with a lower combined scoreof Srep + Sreadnal step is
The third step is where we generate higher order n-grams that act as candidate summaries. This is done by concatenating existing candidates with seed bigrams that share an overlapping word. This gives you the new candidates. Then non-promising candidates are pruned based on their representativeness and readability scores. This further reduces the overall candidate search space. Then we also check for redundancy. If two candidates have high similarity scores, we would discard one these candidates.And this process repeats for the shortlisted candidates until there is no other expansion possibilitiesThen to get the final summary, we sort the candidate n-grams based on their objective function values (i.e., sum of Srep and Sread) and we gradually add phrases until the max summary length is reached
Moving into the evaluation part….
To evaluate this summarization task we use product reviews from CNET for 330 diff products for products that have a minimum of 5 reviews. The CNET review, has 3 parts to it. It has pros, cons and the full review. We ignored the pros and cons and only focused on summarizing the full reviews.
As our first baseline, we use TF-IDF, an unsupervised language-independent method commonly used for keyphrase extraction tasks. To make the TF-IDF baseline competitive, welimit the n-grams in consideration to those that contain at least one adjective (i.e. favoring opinion containing n-grams). The performance is much worse without this selection step. Then, we used the same redundancy removal technique used in our approach to allow a set ofnon-redundant phrases to be generated.For our second baseline, we use KEA a state-of-the-art keyphrase extraction method. KEA buildsa Naive Bayes model with known keyphrases, and then uses the model to find keyphrases in new documents. The KEA model was trained using 100 review documents withheldfrom our dataset . With KEA as a baseline, we would gain insights into the applicabilityof existing keyphrase extraction approaches for the task of micropinion generation. The comparison is strictly speaking an unfair comparison.As our final representative baseline, we use Opinosis, an abstractive summarizer designed to generate textual summary of opinions.
3 key questions rated on a scale from 1 (Very Poor) to 5 (Very Good). Grammaticality: Are the phrases in the summary readable with no obvious errors? Non-redundancy: Are the phrases in the summary unique with unnecessary repetitions such as repeated facts or repeated use of noun phrases? Informativeness: Do the phrases convey important information regarding the product? This can be positive/negative opinions about the product or some critical facts about the product. Note that the first two aspects, grammaticality and non-redundancy are linguistic questions used at the 2005 Document Understanding Conference (DUC) [4]. The last aspect, informativeness, has been used in other summarization tasks and is key in measuring how much users would learn fromthe summaries.
-This graph shows the rouge scores of thedifferent summarization methods for different summary lengths.-We have KEA is a supervised keyphrase extraction model-We have a simple tfidf based keyphrase extraction method-Then we have Opinosis-TF-IDF performs the worst. This is likely due to insucient redundancies of meaningful n-grams.-KEA does only slightly better than TF-IDF even though KEA is a supervised approach. While KEA is suitable for characterizing documents, such a supervised approach proves to be insufcient for the task of generating micropinions. The model may actually need a more sophisticated set of features for it to generate more meaningful summaries.-Opinosis performs better than both KEA and TFIDF-For this task, WebNgram performs the best
Phrases that are well-formed tend to be in general longer…Thus, a few readable phrases is more desirable than many fragmented (or very short) phrases. So for example if we have two phrases very clear screen and very clear. Here it is obvious that the more readable phrase is the first one as it is not fragmented and is well formed. This well-formed phrase is also longer than the short fragmented one.So now we will look into the average number of phrases generated by each summarization method.
-This graph shows the average number of phrases generated for different summary lengths using different summarization methods-Here we can see that KEA generates the most # of phrases, so it tends to favor very short phrases. On the other hand While the phrases may represent characteristics of the document, the phrases are not well formed enough to convey essential opinions.
In our candidate generation approach, we form n-grams from seed words from the original text. So there is a potential of forming new phrases not in the text. It is reasonable to think that with sufficiently redundant opinions, searching the restricted space of all seen n-grams may be enough for generating reasonable summaries. To test this hypothesis, we performed a run by forcing onnly seen n-grams to appear as candidate summaries.
-This graph shows the ROUGE-2 recall scores using seen n-grams vs. system generated n-grams for different summary lengths. -From these results it is clear that our search algorithm actually helps discover useful new phrases
-Now we will look into the stability of the non-user dependent parameters, sigma rep and sigma read.-the purpose of these two parameters is to control the min rep and readability scores. -This actually helps us prune non-promising candidates which eventually helps improve the efficiency of the algorithm.-Without these two parameters we would still arrive at a reasonable solution but the time to convergence would be much longer and the results could be skewed…e.g. very representative but not readable…so these parameters need to be correctly set.
-This first graph shows the ROUGE-2 recall scores at different sigma read settings-This second graph shows the ROUGE-2 recall scores at different sigma rep settings-So here we can see that the performance is actually stable at different settings except when the conditions are extreme – when the sigma read or sigma rep scores are too restrictive.
The ideal setting is thus…
-To assess potential utility of the generated summaries to real users, a subset of the summaries were manually evaluated.-As mentioned earlier, we look at 3 aspects: grammaticality, non-redundancy and informativeness scores -Each scored on a scale from 1 to 5-A score below 3 is considered `poor' and a score above 3 is considered `good'.-These are the results for the 3 aspects-The results on human summaries serves as an upper bound for comparison.
-Earlier we showed that TF-IDF had lowest ROUGE scores amongst all the approaches, indicating that the summaries may not be very useful. -Human assessors agree with this conclusion as TF-IDF summaries received poor scores (all below 3) on all three dimensions compared to WebNgram and human summaries.
In summary, in this work we proposed an optimi….with special emphasis on…of the generated summaries.Our evaluation shows that the …We give special emphasis on….