Your SlideShare is downloading. ×

Micropinion Generation

2,044

Published on

Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions, WWW '12 …

Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions, WWW '12

This work presents a new unsupervised approach to generating ultra-concise summaries of opinions focusing on three aspects: compactness, representativeness and readability.

By: Kavita Ganesan

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,044
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  • Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  • Most current work in opinion summarization focus on predictingthe aspect based ratings for an entity.For example, for an ipod, you may predict that the appearance is 5 stars, ease of use is three stars..etc.
  • The problem is that non-informative phrases may be selected as candidate phrases. Phrases such as battery life and screen size may become candidate phrases since traditionally it is used only to characterize or tag documents.This is however non informative for an opinion summary.
  • We try to capture these three criteria using the following optimzation framework.
  • M represents a micropinion summary consisting of a set of short phrases
  • Mi to mk represents each short phrase in the summary
  • Srep(mi) is the representativeness score of miSread(mi) is the readability score of mi
  • This equation is the objective function
  • These are the constraints of the optimization framework
  • Objective function ensure that the summaries reflect opinions from the original text and is reasonably well formed. This is an example of the objective function values
  • -The first threshold constraint that controls the maximum length of the summary.-This is a user adjustable parameter and helps captures compactness.
  • This constraint controls the similarity between phrases which is a user adjustable parameters.Italso captures compactness by minimizing the amount redundancies.
  • To score similarity between phrases, we simply use the jaccard similarity measure
  • Then the next is representativeness…Then in scoring representativeness, we have defined two properties of highly representative phrases:The words in each phrase should be strongly associated within a narrow window in the original textThe words in each phrase should be sufficiently frequent in the original textThese two properties is capture by a modified pointwise mutual information functionThen to scoreThe intuition is that if a generated phrase occurs frequently onthe web, then this phrase is readable.
  • This is the original PMI formula. It captures property 1 by measuring the strength of association between two words.
  • This is the modified PMI formula. We add the frequency of occurrence of words within a narrow window.This would capture property 2 by rewarding highly co-occuring word pairs. This also addresses the problem ofsparseness, where low frequency words tend to yield in high PMI scores.
  • So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
  • So, to actually score the representativeness of a phrase, we take the average strength of association which is pmi’ of a given word wi with C of its neighboring words. C is a window size that is empirically set. We will later show influence of C on the generated summaries.
  • Readability scoring is to determine how well formed the constructed phrases are.Since the phrases are constructed from seed words, we can have new phrases that may not have occurredin the original text.
  • Now lets look at some examples of readability scores, using the tri-gram lm. Let us say we want to summarize some facts about the battery life….
  • Now we have all the scoring components that we need for the optimization frameworkBut we cannot score each candidate phrase because we are composing phrases using words from the original text….the potential solution space can be huge. Our solution is to use a greedy algorithm where we explore the solution with some heuristics pruning so that we only touch the most promising candidates.
  • From the input text to be summarized, which is basically..some review texts,We shortlist a set of high freq unigrams where the count is larger than the median count (not counting 1’s). This acts as an initial reduction in search space. Then from these unigrams, we generate seed bigrams by pairing one unigram wiith every other unigram. We compute representativeness of the seed bigrams and keep only those that meet the minimum rep requirement. Pairunigram with every other unigram. Compute the representativeness scores and keep only those where the Srep >σrepThe next step is trying to grow the seed bigrams into higher order promising n-grams. This is done by concatenating existing candidates with bigrams that share an overlapping word. At any point, if the representativeness or readability scores are below the corresponding thresholds, these n-grams are pruned.At this point wel aslo check redundancy of the generated candidates, and if two phrases have a similarity higher than the sim threshold, we would discard the one with a lower combined scoreof Srep + Sreadnal step is
  • The third step is where we generate higher order n-grams that act as candidate summaries. This is done by concatenating existing candidates with seed bigrams that share an overlapping word. This gives you the new candidates. Then non-promising candidates are pruned based on their representativeness and readability scores. This further reduces the overall candidate search space. Then we also check for redundancy. If two candidates have high similarity scores, we would discard one these candidates.And this process repeats for the shortlisted candidates until there is no other expansion possibilitiesThen to get the final summary, we sort the candidate n-grams based on their objective function values (i.e., sum of Srep and Sread) and we gradually add phrases until the max summary length is reached
  • Moving into the evaluation part….
  • To evaluate this summarization task we use product reviews from CNET for 330 diff products for products that have a minimum of 5 reviews. The CNET review, has 3 parts to it. It has pros, cons and the full review. We ignored the pros and cons and only focused on summarizing the full reviews.
  • As our first baseline, we use TF-IDF, an unsupervised language-independent method commonly used for keyphrase extraction tasks. To make the TF-IDF baseline competitive, welimit the n-grams in consideration to those that contain at least one adjective (i.e. favoring opinion containing n-grams). The performance is much worse without this selection step. Then, we used the same redundancy removal technique used in our approach to allow a set ofnon-redundant phrases to be generated.For our second baseline, we use KEA a state-of-the-art keyphrase extraction method. KEA buildsa Naive Bayes model with known keyphrases, and then uses the model to find keyphrases in new documents. The KEA model was trained using 100 review documents withheldfrom our dataset . With KEA as a baseline, we would gain insights into the applicabilityof existing keyphrase extraction approaches for the task of micropinion generation. The comparison is strictly speaking an unfair comparison.As our final representative baseline, we use Opinosis, an abstractive summarizer designed to generate textual summary of opinions.
  • 3 key questions rated on a scale from 1 (Very Poor) to 5 (Very Good). Grammaticality: Are the phrases in the summary readable with no obvious errors? Non-redundancy: Are the phrases in the summary unique with unnecessary repetitions such as repeated facts or repeated use of noun phrases? Informativeness: Do the phrases convey important information regarding the product? This can be positive/negative opinions about the product or some critical facts about the product. Note that the first two aspects, grammaticality and non-redundancy are linguistic questions used at the 2005 Document Understanding Conference (DUC) [4]. The last aspect, informativeness, has been used in other summarization tasks and is key in measuring how much users would learn fromthe summaries.
  • -This graph shows the rouge scores of thedifferent summarization methods for different summary lengths.-We have KEA is a supervised keyphrase extraction model-We have a simple tfidf based keyphrase extraction method-Then we have Opinosis-TF-IDF performs the worst. This is likely due to insucient redundancies of meaningful n-grams.-KEA does only slightly better than TF-IDF even though KEA is a supervised approach. While KEA is suitable for characterizing documents, such a supervised approach proves to be insufcient for the task of generating micropinions. The model may actually need a more sophisticated set of features for it to generate more meaningful summaries.-Opinosis performs better than both KEA and TFIDF-For this task, WebNgram performs the best
  • Phrases that are well-formed tend to be in general longer…Thus, a few readable phrases is more desirable than many fragmented (or very short) phrases. So for example if we have two phrases very clear screen and very clear. Here it is obvious that the more readable phrase is the first one as it is not fragmented and is well formed. This well-formed phrase is also longer than the short fragmented one.So now we will look into the average number of phrases generated by each summarization method.
  • -This graph shows the average number of phrases generated for different summary lengths using different summarization methods-Here we can see that KEA generates the most # of phrases, so it tends to favor very short phrases. On the other hand While the phrases may represent characteristics of the document, the phrases are not well formed enough to convey essential opinions.
  • In our candidate generation approach, we form n-grams from seed words from the original text. So there is a potential of forming new phrases not in the text. It is reasonable to think that with sufficiently redundant opinions, searching the restricted space of all seen n-grams may be enough for generating reasonable summaries. To test this hypothesis, we performed a run by forcing onnly seen n-grams to appear as candidate summaries.
  • -This graph shows the ROUGE-2 recall scores using seen n-grams vs. system generated n-grams for different summary lengths. -From these results it is clear that our search algorithm actually helps discover useful new phrases
  • -Now we will look into the stability of the non-user dependent parameters, sigma rep and sigma read.-the purpose of these two parameters is to control the min rep and readability scores. -This actually helps us prune non-promising candidates which eventually helps improve the efficiency of the algorithm.-Without these two parameters we would still arrive at a reasonable solution but the time to convergence would be much longer and the results could be skewed…e.g. very representative but not readable…so these parameters need to be correctly set.
  • -This first graph shows the ROUGE-2 recall scores at different sigma read settings-This second graph shows the ROUGE-2 recall scores at different sigma rep settings-So here we can see that the performance is actually stable at different settings except when the conditions are extreme – when the sigma read or sigma rep scores are too restrictive.
  • The ideal setting is thus…
  • -To assess potential utility of the generated summaries to real users, a subset of the summaries were manually evaluated.-As mentioned earlier, we look at 3 aspects: grammaticality, non-redundancy and informativeness scores -Each scored on a scale from 1 to 5-A score below 3 is considered `poor' and a score above 3 is considered `good'.-These are the results for the 3 aspects-The results on human summaries serves as an upper bound for comparison.
  • -Earlier we showed that TF-IDF had lowest ROUGE scores amongst all the approaches, indicating that the summaries may not be very useful. -Human assessors agree with this conclusion as TF-IDF summaries received poor scores (all below 3) on all three dimensions compared to WebNgram and human summaries.
  • In summary, in this work we proposed an optimi….with special emphasis on…of the generated summaries.Our evaluation shows that the …We give special emphasis on….
  • Transcript

    • 1. Kavita Ganesan, ChengXiang Zhai & Evelyne ViegasGo to project page
    • 2. Opinion Summary for iPod TouchCurrent methods: Focus ongenerating aspect based ratingsfor an entity[Lu et al., 2009; Lerman et al., 2009;..]
    • 3. Opinion Summary for iPod TouchTo know more: read manyredundant sentences
    • 4. Opinion Summary for iPod TouchTo know more: read manyredundant sentences Structured summaries useful, but insufficient!
    • 5. Textual opinion summaries: • big and clear screen • battery lasts long without wifi • great selection of apps ………………Textual summaries can provide more information…
    • 6.  Represent the major opinions  Key complaints/praise in text  Important critical information Readable/well formed  Easily understood by readers Compact/Concise  Can be viewed on all screen sizes ▪ e.g. phone, pda, tablets, dekstops  Maximize information conveyed
    • 7.  Generate a set of non-redundant phrases:  Summarizing key opinions in text  Short (2-5 words) Micropinions  Readable Micropinion summary for a restaurant: “Good service” “Delicious soup dishes” Ultra-concise nature of phrases to allow flexible adjustment of summaries according to display constraints
    • 8.  Has been widely studied [Radev et al.2000; Erkan & Radev, 2004; Mihalcea & Tarau, 2004…] Fairly easy to implement Problem: Not suitable for concise summaries  Bias – with limit on summary size ▪ selected sentence/phrase may have missed critical info  Verbose - may contain irrelevant information ▪ not suitable for smaller devices
    • 9.  Understand the original text and “re-tell” story in a fewer words  Hard to achieve! Some methods require manual effort [DeJong 1982; Radev & McKeown 1998; Finley & Harabagiu 2002]  Need to define templates  Later filled with info using IE techniques Some methods rely on deep NL understanding [Saggion & Lapalme 2002; Jing & McKeown 2000]  Domain dependent  Impractical – high computational costs
    • 10.  Goal is to extract important phrases from text  Traditionally used to characterize documents  Can be potentially used to select key opinion phrases  Closest to our goal! Problem:  Only topic phrases may be selected ▪ E.g. battery life, screen size  candidate phrases ▪ Enough to characterize documents, but we need more info!  Readability aspect is not much of a concern ▪ E.g. “Battery short life” == “Short battery life”  Most methods are supervised – need training data [Tomokiyo & Hurst 2003; Witten et al. 1999; Medelyan & Witten 2008]
    • 11.  Unsupervised, lightweight & general approach to generating ultra-concise summaries of opinions Idea is to use existing words in original text to compose meaningful summaries Emphasis on 3 aspects:  Compactness ▪ summaries should use as few words as possible  Representativeness ▪ summaries should reflect major opinions in text  Readability ▪ summaries should be fairly well formed
    • 12. kM arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 13. kM arg max mi...mk Srep (mi) Sread (mi) i 1Micropinion Summary, M subject to2.3 very clean rooms2.1 friendly service k1.8 dirty lobby and pool mi ss1.3 nice and polite staff i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 14. k M arg max mi...mk Srep (mi) Sread (mi) i 1 Micropinion Summary, M subject tom1 2.3 very clean roomsm2 2.1 friendly service km3 1.8 dirty lobby and pool mi ssmk 1.3 nice and polite staff i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 15. Representativeness score of mi kM arg max mi...mk Srep (mi) Sread (mi) i 1 subject to Readability score of mi k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 16. Objective function kM arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 17. kM arg max mi...mk Srep (mi) Sread (mi) i 1 constraints subject to k mi ss i 1 Srep (mi) rep Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 18. kM arg max mi...mk Srep (mi) Sread (mi) i 1Objective function: Optimize subject to m 1 2.3 very clean roomsrepresentativeness & k m2 2.1 friendly servicereadability scores mi m ss 3 1.8 dirty lobby and poolEnsure: summaries reflect key opinions & mk 1.3 nice and polite staff i 1reasonably well formed S (m ) rep i rep Objective function value Sread (mi) read sim( mi , mj ) sim i, j 1, k
    • 19. kM arg max mi...mk Srep (mi) Sread (mi) Constraint 1: Maximum i 1 length of summary. subject to •User adjustable k •Captures compactness. mi ss i 1 Srep (mi) rep Sread (mi) read sim(mi , mj ) sim i, j 1, k
    • 20. kM arg max mi...mk Srep (mi) Sread (mi) i 1Constraint 2 & 3: Min representativeness & subject to readability. k •Helps improve efficiency mi •Does not affect performance ss i 1 Srep (mi) rep Sread (mi) read sim(mi , mj ) sim i, j 1, k
    • 21. kM arg max mi...mk Srep (mi) Sread (mi) i 1 subject to k mi Constraint 4: Max ss i 1 similarity between phrases Srep (mi) • User adjustable rep • Captures compactness by Sread (mi) minimizing redundancies read sim(mi , mj ) sim i, j 1, k
    • 22. Ensures representativeness &readability k M arg max mi...mk Srep (mi) Sread (mi) i 1 subject to Capture compactness k mi ss i 1 Srep (mi) rep Sread (mi) Capture compactness read sim( mi , mj ) sim i, j 1, k
    • 23.  Now, we need to know how to compute:  Similarity scores: sim(mi, mj) ?  Readability scores: Sread(mi, mj) ?  Representativeness scores: Srep(mi, mj) ?
    • 24.  Score similarity between 2 phrases User adjustable parameter Why important?  Allows user to control redundancy  E.g. With small devices users may desire summaries with good coverage of information less redundancies ! Measure used:  Standard Jaccard Similarity Measure  Can use other measures (e.g. cosine) – not our focus
    • 25.  Purpose: Measure how well a phrase represents opinions from the original text? 2 properties of a highly representative phrase: 1. Words should be strongly associated in text 2. Words should be sufficiently frequent in text Captured by a modified pointwise mutual information (PMI) function
    • 26. Captures property 1: Measures strength of association between Original PMI function, pmi(wi,wj) words p( wi, wj ) pmi ( wi , wj ) log 2 p( wi ) p( wj )
    • 27.  Original PMI function, pmi(wi,wj) p( wi, wj ) pmi ( wi , wj ) log 2 Captures property 2: p( wi ) p( wj ) Rewards well associated words with Modified PMI function, high co-occurrences pmi’(wi,wj) Add frequency of p( wi, wj ) c( wi, wj ) occurrence within pmi ( wi , wj ) log2 p( wi ) p( wj ) a window
    • 28. To compute representativeness of a phrase: n Take average strength of 1 Srep (w1.. wn) association (pmi’) of each pmilocal ( wi ) n i 1 word, wi in phrase with C neighboring words i C 1 pmilocal ( wi ) [ pmi ( wi, wj )] 2C j i C
    • 29. Gives a good estimate of how strongly associatedTo compute representativeness of the words are in a phrase a phrase: n 1 Srep (w1.. wn) pmilocal ( wi ) n i 1 i C 1 pmilocal ( wi ) [ pmi ( wi, wj )] 2C j i C
    • 30.  Purpose: Measure well-formedness of phrases  Phrases are constructed from seed words ▪ can have new phrases not in original text  No guarantee phrases would be well-formed chain rule to compute joint probability in terms of Our readability scoring: conditional probabilities (averaged)  Based on Microsofts Web N-gram model (publicly available)  N-gram model used to obtain conditional probabilities of phrases n 1 S read ( wk ... wn ) log 2 p ( wk |wk q 1... wk ) 1 K k q  Intuition: A phrase is more readable if it occurs more frequently on the web
    • 31. Example: Readability scores of phrases using tri-gram LM Ungrammatical Grammatical “sucks life battery” -4.51 “battery life sucks” -2.93 “life battery is poor” -3.66 “battery life is poor” -2.37
    • 32.  Scoring each candidate phrase is not practical! Why?  Phrases composed using words from original text  Potential solution space would be Huge! Our solution: greedy summarization algorithm  Explore solution space with heuristic pruning  Touch only the most promising candidates
    • 33. Input Unigrams Seed Bigrams …. very very + nice nice very + clean very + dirty place clean + place Srep > σ clean clean + room rep Text to be problem dirty dirty + place …summarized room … Step 1: Shortlist Step 2: Form seed high freq unigrams bigrams by pairing (count > median) unigrams. Shortlist by Srep. (Srep > σrep)
    • 34. Higher order n-grams SummaryCandidates + Seed Bi-grams = New Candidates clean rooms = very clean rooms 0.9 very clean rooms very clean + clean bed very clean bed 0.8 friendly service dirty room very dirty room 0.7 dirty lobby and pool very dirty + = 0.5 nice and polite staff dirty pool very dirty pool ….. very nice + nice place = very nice place ….. nice room very nice room Sorted Candidates Srep<σrep ; Sread<σreadStep 3: Generate higher order n-grams. Step 4: Final summary.• Concatenate existing candidates + seed bigrams Sort by objective• Prune non-promising candidates (Srep & Sread) function value. Add• Eliminate redundancies (sim(mi,mj)) phrases until |M|< σss• Repeat process on shortlisted candidates (until no possbility of expansion)
    • 35.  Product reviews from CNET (330 products)  Each product has minimum of 5 reviews CNET REVIEW xxxxxx PROS Content Summarized CONS FULL REVIEW
    • 36. Given Textual reviews about a product Constraints Micropinion Summary, M m1 2.3 easy to use summary size, σssTask m2 2.1 lense is not clear redundancy, σsim m3 1.8 too big for pocket mk 1.3 expensive batteries representativeness, σrep readability , σread argmax Srep(mi) + Sread(mi)
    • 37.  Human composed summaries  Two human summarizers ▪ Each summarize 165 product reviews (total 330)  Top 10 phrases from pros & cons provided as hints ▪ Hints help with topic coverage  reduce bias  Summarizers asked to compose a set of short phrases (2-5 words) summarizing key opinions on the product
    • 38.  3 representative baselines:  Tf-Idf – unsupervised method commonly used for key phrase extraction tasks ▪ Selected only adjective containing n-grams (performance reasons) ▪ Redundancy removal used to generate non-redundant phrases  KEA – state of the art supervised key phrase extraction model [witten et al. 1999] ▪ Uses a Naive Bayes model ▪ Trained using 100 review documents withheld from dataset  Opinosis – unsupervised abstractive summarizer designed to generate textual opinion summaries [ganesan et al. 2010] ▪ Shown to be effective in generating concise opinion summaries ▪ Designed for highly redundant text
    • 39.  ROUGE - to determine quality of summary (ROUGE-1 & ROUGE-2)  Standard measure for summarization tasks  Measures precision & recall of overlapping units between computer generated summary & human summaries
    • 40.  Questionnaire – to assess potential utility of generated summaries to users  Answered by 2 assessors  Original reviews provided as reference Questionnaire [1] Very Poor - [5] Very Good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010] Do the phrases Are the phrases Are the phrases in convey important readable? the summary information about unique? the product?
    • 41.  Our approach is referred to as WebNGram
    • 42. WebNgram: Performs 0.09 the best for this task 0.08 0.07 Opinosis: much betterROUGE-2 RECALL 0.06 than KEA & tfidf 0.05 KEA: slightly KEA 0.04 better than tfidf Worst Tfidf: Tfidf 0.03 performance Opinosis 0.02 WebNGram 0.01 0.00 5 10 15 20 25 30 Summary Size (max words)
    • 43.  Intuition: Well-formed phrases tend to be longer in general very clear screen vs. very clear good battery life vs. good battery screen is bright vs. screen is A few longer phrases is more desirable than many fragmented (i.e. short) phrases
    • 44. KEA: Generates most # of 12 phrases (i.e. favors short 11.75 phrases) 11 WebNGram: Generates fewest phrases on average. 10Average # of Phrases 9.86 (each phrase is longer) 9 WebNgram 8.74 8 Tfidf 7.36 Opinosis 7 Kea 6 6.04 5 4.53 4 WebNGram phrases are generally more 15 well-formed 30 25 Summary Size (max words)
    • 45.  In our algorithm: n-grams are generated from seed words  potential of forming new phrases not in original text Why not use existing n-grams?  With redundant opinions, using seen n-grams may be sufficient  Performed a run by forcing only seen n-grams to appear as candidate phrases.
    • 46. 0.09 System generated 0.08 n-grams 0.07 Seen n-gramsROUGE-2 Recall 0.06 0.05 0.04 WebNGram 0.03 WebNGramseen 0.02 0.01 0.00 Our search 15 20 25 30 discover 5 10 algorithm helps Summary Size (max words) useful new phrases
    • 47. Example:Unseen N-Gram (Acer AL2216 Monitor)“wide screen lcd monitor is bright”readability : -1.88representativeness: 4.25 “…plus the monitor is very bright…” Related “…it is a wide screen, great color, great quality…” snippets in “…this lcd monitor is quite bright and clear…” original text
    • 48.  Purpose of σrep & σread:  Control minimum representativeness & readability  Helps prune non-promising candidates  improves efficiency of algorithm Without σrep & σread - we would still arrive at a solution, however:  Time to convergence would be much longer  Results could be skewed  These parameters need to be set correctly!
    • 49. ROUGE-2 scores at ROUGE-2 scores atvarious σread settings various σrep settings 0.06 0.040 0.04 ROUGE-2 ROUGE-2 0.035 0.02 0.00 0.030 -4 -3 -2 σread -1 1 2 3 σrep 4 5 Performance is stable except in extreme conditions - thresholds are too restrictive
    • 50. ROUGE-2 scores at ROUGE-2 scores atvarious σread settings various σrep settings 0.06 0.040 0.04 ROUGE-2 ROUGE-2 0.035 0.02 0.00 0.030 -4 -3 -2 σread -1 1 2 3 σrep 4 5 Ideal setting: σread : between -2 and -4 σrep : between 1 and 4
    • 51. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010]WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5Human 4.7/5 Human 4.5/5 Human 3.6/5
    • 52. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010]WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5Human 4.7/5 Human 4.5/5 Human 3.6/5 TFIDF: Poor scores on all 3 aspects (all below 3.0)
    • 53. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010]WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5Human 4.7/5 Human 4.5/5 Human 3.6/5 WebNGram: All scores above 3.0 Close to human scores
    • 54. Questionnaire [Score 1-5] Score < 3  poor Score > 3  good Grammaticality Non-redundancy Informativeness [DUC 2005] [DUC 2005] [Filippova 2010]WebNgram 4.2/5 WebNgram 3.9/5 WebNgram 3.2/5TfIDF 2.0/5 TfIDF 2.3/5 TfIDF 1.7/5Human 4.7/5 Human 4.5/5 Human 3.6/5 Subjective aspect. Human summaries scored slightly better than WebNgram
    • 55.  Semantic redundancies “Good sound quality” ≠ “Excellent audio”  semantically similar  Due to directly using surface words  Solution: opinion/phrase normalization before summarizing Some phrases are not opinion phrases “I bought this for Christmas”  grammatical & representative  Due to our candidate selection strategy ▪ Plus Side: Very general approach, can summarize any text ▪ Down Side: Summaries may be too general  Solution: stricter selection of phrases ▪ E.g. Select only opinion containing phrases
    • 56. Canon Powershot SX120 ISEasy to useGood picture qualityCrisp and clearGood video qualityUseful for pushing opinionsto devices where the screenis small E-reader/ Smart Cell Phones Tablet Phones
    • 57.  Optimization framework to generate ultra-concise summaries of opinions  Emphasis: representativeness, readability & compactness Evaluation shows our summaries are:  Well-formed and convey essential information  More effective than other competing methods Our approach is unsupervised, lightweight & general  Can summarize any other textual content (e.g. news articles, tweets, user comments, etc.)

    ×