SlideShare a Scribd company logo
1 of 34
Образец заголовка
Tutorial on Automatic
by Shilpa Subrahmanyam
Prepared as an assignment for CS410: Text Information Systems in Spring 2016
Образец заголовкаThe World Today
• We live in an age in which a massive amount
of content is available at our fingertips
– 500 million tweets are posted every day
– 2 million + articles are posted daily
– The average article length at the New York Times
is about 1,200 words.
• Automatic summarization can prove
extremely useful in attempting to generate
insights and themes from such large corpuses
of data.
Образец заголовка
What is Automatic
• Definition: Employing an algorithm to distill a
text corpus to a considerably smaller body of
important ideas, sentences, phrases, etc.
• Key Challenges:
– Determining how to rank the importance of a
sentence, word, or phrase.
– Eliminating and capitalizing on redundancy
– Incorporating sentiment into the summary
– Ensuring the summary is readable
– Avoiding a search through an exponential
solution space
Образец заголовка
Types of Automatic
• Extractive Summarization
– Select a subset of existing words, phrases, or
sentences in the original text in order to
generate a summary.
• Abstractive Summarization
– Aims to create a summary that is closer to
what a human might generate.
– Phrases in the summary don’t necessarily
need to have appeared in the original text
• Keyword/Key-phrase extraction
Образец заголовкаApplications and Use Cases
• Summarizing tweets in order to determine
the timeline of a sports game
• Product owners summarizing highly
redundant product reviews (in order to get
popular opinions and key insights)
• Distilling a long article to a set of key
Образец заголовка
Summarization of Small Data
(i.e. Tweets, microblogs, etc.)
• These are summarization methods that are targeted
at Tweets, microblogs, and other content-limited
• A lot of work has been done recently in this area
because of the increasing availability of large
amounts of content-limited small data samples that
are readily available via the advent of Twitter and
similar sites that value smaller sized content.
Образец заголовкаBasic Approaches
• Topical Keyphrase Extraction from
Twitter (Zhao, et al.)
• Summarizing Microblogs Automatically
(Sharifi, et al.)
Образец заголовка
Topical Keyphrase Extraction from Twitter
(Xin Zhao, Jiang, He, Song,
Achananuparp, Lim, Li)
• The method proposed by the authors is a
context-sensitive topical PageRank method
for keyword ranking.
• This is paired with a probabilistic scoring
function that considers two factors of key
phrases when doing key-phrase ranking:
– relevance
– interestingness
Key idea: Generate a list of topical key phrases that will serve as a
summarization of a corpus of tweets.
Образец заголовка
Summarizing Microblogs Automatically
(Sharifi, Hutton, Kalita)
• Start with a topic or phrase and generate tweets that are related to that
topic or phrase.
• Isolate the longest sentence in each tweet that contains the topic phrase.
We use this set of sentences as the input to our algorithm.
• Build a graph representing the common sequences of words (the common
phrases) that occur before and after the key topic phrase.
– This root node represents the topic phrase.
– Each word is represented by a node and a count that indicates how
many times the word occurs within the set of input sequences. A
phrase is represented in the graph by a sequence of nodes starting
with the root.
Key Idea: take a trending phrase, collect a huge number of tweets
containing that trending phrase, and provide an automatically generated
summary of the tweets that were collected.
Образец заголовка
Summarizing Microblogs Automatically
(Sharifi, Hutton, Kalita)
• Assign each node to a weight. This is in order to prevent
longer phrases from dominating the output.
– Words are given weights that are proportional to their count.
• Construct “partial summary” by searching the graph for
a path with the largest total weight (it searches all paths
that begin with the root node and end with a non-root
– This path represents the most common phrase occurring
either before or after the topic phrase.
• Run algorithm once more. This time, we need to
initialize the root node with the partial summary and
rebuild the graph. This time around, the most heavily
weighted path from the new graph is the final
summary produced by the algorithm
Образец заголовка
Why are these approaches
• Topical Key phrase Extraction from Twitter
– Key phrase extraction can often produce noisy results (i.e.
key phrases that are common but don’t help to identify
– Moreover, oftentimes, it may not be sufficient for
summative purposes to simply look through a list of key
• More detail may be required for a sufficient grasp of the original
• Summarizing Microblogs Automatically
– This approach is predicated on the fact that we specifically
retrieve tweets that all pertain to the same trending phrase
as an input to our algorithm.
– Extracts only the longest sentence that contains the topic
keyword(s) from each tweet as an input to the graph
• The equation of length of a sentence to importance could prove
fallacious. Furthermore, this could mean discarding valuable
Образец заголовка
What are some more advanced
• Twitter Topic Summarization by
Ranking Tweets Using Social Influence
and Content Quality (YaJuan, et al.)
• Summarizing Sporting Events Using
Twitter (Nichols, et al.)
• Sumblr: continuous summarization of
evolving tweet streams (Shou, et al.)
Образец заголовка
Twitter Topic Summarization by Ranking
Tweets Using Social Influence and
Content Quality (YaJuan, ZhuMin, FuRu,
Ming, Heung − Yeung)
• This approach takes advantage of follower-followee
relationships on Twitter -- which is the main manner in which
social influence of users is inferred. The quality of tweets is
judged based a few factors that are incorporated into the
graph-based ranking algorithm:
– readability
– content richness
– a measure of the regularity of written language
– pointless degree of the content.
• In order to curb redundancy within the final summary, the
model selects tweets from the ranking results using a Maximal
Marginal Relevance algorithm (Carbonell and Goldstein, 1998).
Key idea: Algorithm models and formulates the ranking of tweets in a
unified mutual reinforcement graph.
Образец заголовка
Twitter Topic Summarization by Ranking
Tweets Using Social Influence and
Content Quality (YaJuan, ZhuMin, FuRu,
Ming, Heung − Yeung)
• Algorithm models the problem of tweet
ranking in a unified mutual reinforcement
– In this model, social influence of users and a
measure of the quality of the tweet content
are both taken into consideration (in a
simultaneously mutually reinforcing manner).
Образец заголовка
Summarizing Sporting Events Using
Twitter (Nichols, Mahmud, Drews)
• Takes advantage of the fact that throughout the course
of sports games, viewers generally tend to make Twitter
updates expressing opinions about different events that
occur throughout the game.
• Aims to generate a natural summary of the event that
incorporates temporal cues, such as spikes in the
volume of status updates, in order to identify important
moments throughout the course of the game.
• Aims to implement a sentence ranking method that is
used to extract relevant sentences from the tweet
corpus -- each presumably referring to an important
moment in the game.
Key idea: Summarize sporting events from a live corpus of tweets.
Образец заголовка
Sumblr: continuous summarization of
evolving tweet streams (Shou, Wang, Ke
Chen, Gang Chen)
• Traditional automatic summarization
methods for text documents primarily focus
on static and small-scale data.
• Sumblr (SUMmarization By stream
cLusteRing) aims to summarize tweet streams
-- thereby providing a dynamic
summarization framework.
Key idea: Timeline-based framework for topic summarization for tweets.
Algorithm ranks and selects a diverse crop of important tweets within a bunch of
different sub-topic groups. These tweets serve as the basis of the summary that
will be composed for each sub-topic.
Образец заголовка
Sumblr: continuous summarization of
evolving tweet streams (Shou, Wang, Ke
Chen, Gang Chen)
• During tweet stream clustering, it is necessary to maintain
statistics for tweets to facilitate summary generation. For
this reason, the authors of the paper introduce a
representation called “tweet cluster vector”(TCV).
• The Sumblr framework operates as follows:
– At the start of the stream, we collect a small number of
tweets and use a k-means clustering algorithm to create
the initial clusters. The corresponding TCVs are initialized.
– Incrementally update the TCVs whenever a new tweet
arrives. At various points it time, the algorithm has to
decide where to create a new centroid, add a tweet to an
existing centroid, or merge/delete existing clusters.
– High-level summarization step produces online and
historical summaries
Образец заголовка
Comparison of Microblog
Paper Pros Cons
Twitter Topic Summarization
by Ranking Tweets Using
Social Influence and Content
Takes advantage of social influence of
authors when ranking tweets; takes
readability, content richness, a
measure of the regularity of written
language, and how pointless the
content is into account.
Algorithm structure could thwart the
summarization of more niche topics
that have sparse follower-followee
adjacency matrices.
Summarizing Sporting Events
Using Twitter
Incorporates temporal cues; deals with
live corpus
Does not use valuable social metadata
to rank tweets
Topical Keyphrase Extraction
from Twitter
Considers both relevance and
Key phrase extraction may not be as
helpful as full sentence summarization
for some use cases – especially for data
that exhibits low topical phrase
Sumblr: continuous
summarization of evolving
tweet streams
Provides a streaming summarization
(as well as historical summaries)
Implementation is more complicated
Summarizing Microblogs
Graph algorithm’s relevance calculation
does not let long sentences have an
unfair advantage over shorter
sentences with just as much important
Extracts only the longest sentence that
contains the topic keyword(s) from
each tweet as an input to the graph
Образец заголовкаSummarization of Larger Data
• The following include summarization
methods that can be applied to larger data
as well. This includes reviews, documents,
news articles, and so forth.
Образец заголовкаBasic Approach
• Extraction based approach for text
summarization using k-means
clustering (Agrawal , et al.)
Образец заголовка
Extraction based approach for text
summarization using k-means clustering
(Agrawal , Gupta)
• At a high level, the algorithm proposed by the
authors of this paper is an unsupervised learning
approach that can be broken down into three
– tokenization of the document
– computing a score for each sentence
– clustering the sentences using k-means
– extracting important sentences
– and combining those sentences in order to form a
Key idea: incorporates k-means clustering, TF-IDF, and tokenization in
order to perform extractive text summarization.
Образец заголовкаWhat makes this approach suboptimal?
• Does not take advantage of redundancy to
rank importance.
• Method for extraction of important
sentence(s) from each centroid is naïve
and can be gamed.
Образец заголовкаMore Advanced Approaches
• Product review summarization from a
deeper perspective (Ly, et al.)
• Mining and Summarizing Customer
(Hu, et al.)
• Micropinion Generation: An Unsupervised
Approach to Generating Ultra-Concise
Summaries of Opinions (Ganesan, et al.)
• Opinosis: A Graph-Based Approach to
Abstractive Summarization of Highly
Redundant Opinions (Ganesan, et al.)
Образец заголовка
Product review summarization from a
deeper perspective (Ly, Sugiyama, Lin,
• The first step is Product Facet Identification.
– In order to identify candidate facets, we need to
preprocess the input reviews.
• This involves tagging part-of-speech, stemming,
assigning syntactic rules, and stop word removal.
– We then deploy the Stanford Dependency Parser
in order to detect the role of each noun.
• We want to discard nouns that aren’t subjects or
• We then use association rule mining to identify
frequent product facets.
Key idea: algorithm automatically summarizes a massive collection of
product reviews and generates a concise, non-redundant summary. Not
only does this system extract review sentiments but it also extracts the
underlying justifications behind the review sentiments.
Образец заголовка
Product review summarization from a
deeper perspective (Ly, Sugiyama, Lin,
• The second step is summarization.
– For each of the facets mined in the previous step, we
want to associate it with relevant opinion sentences
that match the appropriate polarity expressed by the
majority of the opinions in the reference text.
– We first restrict our algorithm to run only on
opinionated sentences from the reviews.
• Furthermore, we perform sentiment analysis on the
sentences to assign a polarity score to each sentence (the
sum of the polarity of each word in a sentence).
– We then calculate content-based pairwise similarities
between all of the resultant opinion sentences. Using
these scores, we perform clustering on the sentences.
– The final task is to select the most representative
sentence from each centroid for the final summary.
Образец заголовка
Mining and Summarizing Customer Reviews
(Hu, Minqing, Liu)
• The paper focuses on the problem of feature-based
summaries of customer reviews of products sold
online. In this context, “features” refers to product
• Given a customer review corpus that pertains to a
given product, summarization is split into three
– First, we must identify the product features that customers
are speaking about.
– Second, for each feature, we have to identify sentences in
the reviews that have positive or negative opinions.
– Last, we must produce a summary that aggregates all of
Key ideas: Algorithm assists merchants in extracting the main ideas and
themes from hundreds, if not thousands, of customer reviews through
product feature extraction and consideration of sentence sentiment
Образец заголовка
Mining and Summarizing Customer Reviews
(Hu, Minqing, Liu)
Summarization system architecture
Образец заголовка
Micropinion Generation: An Unsupervised
Approach to Generating Ultra-Concise
Summaries of Opinions (Ganesan, Zhai,
Key ideas: greedy approach that heuristically prunes the exponential
solution space so that we only have to deal with promising candidates.
Ultimate goal is to generate a compact and informative summary using a
set of micro-opinions.
• Micro-opinion: 2-5 word phrase
• Formal problem set-up:
– Suppose we have a set of sentences Z =zi where i ∈[1,k] from
an opinion document.
– Goal is to generate a micro-opinion summary, M =m where I ∈
[1,k] where |mi| ∈ [2,5] and each mi conveys a key opinion from
– It is quite important to note that while we require that mi use
words that occur at least once in the set Z, we do not require
mi to be an exact subsequence of any of the sentences in Z.
• This, this makes this set-up more of an abstractive summarization
problem rather than an extractive summarization problem.
Образец заголовка
Micropinion Generation: An Unsupervised
Approach to Generating Ultra-Concise
Summaries of Opinions (Ganesan, Zhai,
• Algorithm:
– Start with a set of high frequency unigrams from the
original corpus.
– Then, start to merge these unigrams to generate higher
order bigrams, trigrams, and n-grams.
– At each merge step, we make sure that the candidate n-
grams have reasonably high readability and
representativeness scores.
– The candidate generation process stops when an attempt
to grow an existing candidate leads to low readability or
representativeness scores.
– The final step is to sort all the candidate n-grams based on
their objective function values (i.e., sum of Srepresentativeness
and Sreadability) and generate a micro-opinion summary M
by gradually adding phrases with the highest scores to our
summary until the accumulated summary length reaches
the length threshold.
Образец заголовка
Opinosis: A Graph-Based Approach to
Abstractive Summarization of Highly
Redundant Opinions (Ganesan, Zhai, Han)
• The results of the evaluation studies show that when compared
to the baseline extractive method, the Opinosis summaries are
closer to human summaries.
• The high level picture of the algorithm: generate an abstractive
summary by repeatedly searching the Opinosis graph for sub-
graphs that basically represent semantically valid and
meaningful sentences that happen to have high redundancy
• It is important that these sentences have high redundancy
scores because that means that they are representative of a
major opinion.
• The sentences that are represented by these sub-graphs can be
combined to form an abstractive summary.
Key idea: graph-based approach to automatic text summarization. The
summarization framework generates concise abstractive summaries and
capitalizes on the presence of large amounts of redundancy in the
Образец заголовка
Opinosis: A Graph-Based Approach to
Abstractive Summarization of Highly
Redundant Opinions (Ganesan, Zhai, Han)
• Opinosis constructs a graph that represents the
original text. The paper isolates three properties of
this graph that they exploit in order to explore and
score various sub paths throughout the graph.
These sub-paths are what help to generate the
candidate abstractive summaries.
– Properties:
• Redundancy Capture: extremely redundant textual
occurrences are naturally captured by sub-graphs.
• Gapped Subsequence Capture: existing sentence structures
create “lexical links”. These links then facilitate the discovery
of new sentences.
• Collapsible Structures: nodes that resemble hubs can
potentially be collapsed
Образец заголовкаComparison of Methods for Larger Data
Paper Pros Cons
Mining and Summarizing Customer
Takes sentiment into consideration Algorithm is restricted to run only on
opinionated sentences. This could
discard potentially valuable text.
Micropinion Generation: An
Unsupervised Approach to Generating
Ultra-Concise Summaries of Opinions
Aims to capitalize on existing
redundancy and maximize readability;
abstractive summarization aims to
mimic human summarization; prunes
unpromising candidates
Does not take sentiment into
consideration; essentially provides key
phrases – which may not be optimal for
all use cases
Opinosis: A Graph-Based Approach to
Abstractive Summarization of Highly
Redundant Opinions
Capitalizes on redundancy; emphasized
readability; abstractive summarization
mimics human summarization
Does not take sentiment into
Extraction based approach for text
summarization using k-means
Implementation is simple and
Does not take sentiment into
consideration; doesn’t take advantage
of redundancy to rank importance.
Product review summarization from a
deeper perspective
Takes sentiment into consideration Sentiment consideration needs to be
more sophisticated in order to account
for complex English phrase structure
(i.e. sentences “I am happy” and “I am
not happy” should have an extremely
high sentiment differential. This might
not happen under this approach.)
Образец заголовкаFuture Strides
• Ideally, we want to push towards better
abstractive summarization approaches.
– We want to emulate human summarization as
closely as possible
• Applications of deep learning to automatic
• Highly visual automatic summarizations
Образец заголовка
Thanks for your

More Related Content

What's hot

Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completionYichen Feng
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learningSanjib Basak
Distributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningDistributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningLi Miao
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webKarishma chaudhary
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Sherpa Software
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommendergu wendong
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysisharit66
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative FilteringTayfun Sen
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi

What's hot (20)

Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completion
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learning
Distributed Processing of Stream Text Mining
Distributed Processing of Stream Text MiningDistributed Processing of Stream Text Mining
Distributed Processing of Stream Text Mining
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Recommender systems
Recommender systemsRecommender systems
Recommender systems
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Abstractive Review Summarization
Abstractive Review SummarizationAbstractive Review Summarization
Abstractive Review Summarization
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis

Viewers also liked

形態素列パターンマッチャー MIURAをつくりました @DSIRNLP#6
形態素列パターンマッチャーMIURAをつくりました @DSIRNLP#6形態素列パターンマッチャーMIURAをつくりました @DSIRNLP#6
形態素列パターンマッチャー MIURAをつくりました @DSIRNLP#6Yuya Unno
Introduction to Automatic Summarization
Introduction to Automatic SummarizationIntroduction to Automatic Summarization
Introduction to Automatic SummarizationHitoshi Nishikawa
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)Hitoshi Nishikawa
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion GenerationKavita Ganesan
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Kavita Ganesan
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarizationConstantin Orasan
Seo campus 2015 - L'emploi des référenceurs en france
Seo campus 2015 - L'emploi des référenceurs en franceSeo campus 2015 - L'emploi des référenceurs en france
Seo campus 2015 - L'emploi des référenceurs en franceSEO CAMP
SEO et ecommerce sur Magento: retour d’expérience
SEO et ecommerce sur Magento: retour d’expérienceSEO et ecommerce sur Magento: retour d’expérience
SEO et ecommerce sur Magento: retour d’expérienceAurélien Lavorel
SEO : comment obtenir des liens puissants grâce à un contenu décalé
SEO : comment obtenir des liens puissants grâce à un contenu décaléSEO : comment obtenir des liens puissants grâce à un contenu décalé
SEO : comment obtenir des liens puissants grâce à un contenu décaléLaurent Peyrat
深層学習による機械とのコミュニケーションYuya Unno
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...Mathieu Gheerbrant
Designing Creative Content: How visualising data helps us see
Designing Creative Content: How visualising data helps us seeDesigning Creative Content: How visualising data helps us see
Designing Creative Content: How visualising data helps us seeVicke Cheung
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017Olivier Andrieu
Analyse de logs - Études de cas et best practices - SEO Campus 2017
Analyse de logs - Études de cas et best practices - SEO Campus 2017Analyse de logs - Études de cas et best practices - SEO Campus 2017
Analyse de logs - Études de cas et best practices - SEO Campus 2017iProspect France
HTTPS The Road To A More Secure Web / SEOCamp Paris
HTTPS The Road To A More Secure Web / SEOCamp ParisHTTPS The Road To A More Secure Web / SEOCamp Paris
HTTPS The Road To A More Secure Web / SEOCamp ParisAysun Akarsu
Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?Virginie Clève - largow ☕️
Les défauts de WordPress pour le SEO
Les défauts de WordPress pour le SEOLes défauts de WordPress pour le SEO
Les défauts de WordPress pour le SEODaniel Roch - SeoMix
Measuring Content Marketing
Measuring Content MarketingMeasuring Content Marketing
Measuring Content MarketingDavid Iwanow

Viewers also liked (20)

形態素列パターンマッチャー MIURAをつくりました @DSIRNLP#6
形態素列パターンマッチャーMIURAをつくりました @DSIRNLP#6形態素列パターンマッチャーMIURAをつくりました @DSIRNLP#6
形態素列パターンマッチャー MIURAをつくりました @DSIRNLP#6
Introduction to Automatic Summarization
Introduction to Automatic SummarizationIntroduction to Automatic Summarization
Introduction to Automatic Summarization
Automatic Summarization (2014)
Automatic Summarization (2014)Automatic Summarization (2014)
Automatic Summarization (2014)
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion Generation
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarization
Seo campus 2015 - L'emploi des référenceurs en france
Seo campus 2015 - L'emploi des référenceurs en franceSeo campus 2015 - L'emploi des référenceurs en france
Seo campus 2015 - L'emploi des référenceurs en france
SEO et ecommerce sur Magento: retour d’expérience
SEO et ecommerce sur Magento: retour d’expérienceSEO et ecommerce sur Magento: retour d’expérience
SEO et ecommerce sur Magento: retour d’expérience
SEO : comment obtenir des liens puissants grâce à un contenu décalé
SEO : comment obtenir des liens puissants grâce à un contenu décaléSEO : comment obtenir des liens puissants grâce à un contenu décalé
SEO : comment obtenir des liens puissants grâce à un contenu décalé
Katja Filippova
Katja FilippovaKatja Filippova
Katja Filippova
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...
Techniques SEO 2016 : entités de recherche, navigation à facettes, AJAX et au...
Designing Creative Content: How visualising data helps us see
Designing Creative Content: How visualising data helps us seeDesigning Creative Content: How visualising data helps us see
Designing Creative Content: How visualising data helps us see
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017
Cocon, metamots et plus si affinités sémantiques. Seo campus-03-2017
Analyse de logs - Études de cas et best practices - SEO Campus 2017
Analyse de logs - Études de cas et best practices - SEO Campus 2017Analyse de logs - Études de cas et best practices - SEO Campus 2017
Analyse de logs - Études de cas et best practices - SEO Campus 2017
HTTPS The Road To A More Secure Web / SEOCamp Paris
HTTPS The Road To A More Secure Web / SEOCamp ParisHTTPS The Road To A More Secure Web / SEOCamp Paris
HTTPS The Road To A More Secure Web / SEOCamp Paris
Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?Google AMP 1 an après : quel bilan, quelles perspectives ?
Google AMP 1 an après : quel bilan, quelles perspectives ?
Les défauts de WordPress pour le SEO
Les défauts de WordPress pour le SEOLes défauts de WordPress pour le SEO
Les défauts de WordPress pour le SEO
Measuring Content Marketing
Measuring Content MarketingMeasuring Content Marketing
Measuring Content Marketing

Similar to Automatic Summarizaton Tutorial

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationIJERA Editor
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams1crore projects
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...HopeBay Technologies, Inc.
Netizen style commenting on fashion photos
Netizen style commenting on fashion photosNetizen style commenting on fashion photos
Netizen style commenting on fashion photosJason Tang
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...Daniel Katz
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...eSAT Publishing House
Explaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance SummarizationExplaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance Summarizationmiajang
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Semantic Web Company
TextMiningTwittersLiu Chang
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Surveyvivatechijri

Similar to Automatic Summarizaton Tutorial (20)

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Ire major project
Ire major projectIre major project
Ire major project
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...
Netizen style commenting on fashion photos
Netizen style commenting on fashion photosNetizen style commenting on fashion photos
Netizen style commenting on fashion photos
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...
ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Profe...
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Final presentation
Final presentationFinal presentation
Final presentation
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...Summarization using ntc approach based on keyword extraction for discussion f...
Summarization using ntc approach based on keyword extraction for discussion f...
Explaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance SummarizationExplaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance Summarization
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Survey

Recently uploaded

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27

Recently uploaded (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx

Automatic Summarizaton Tutorial

  • 1. Образец заголовка Tutorial on Automatic Summarization by Shilpa Subrahmanyam Prepared as an assignment for CS410: Text Information Systems in Spring 2016
  • 2. Образец заголовкаThe World Today • We live in an age in which a massive amount of content is available at our fingertips – 500 million tweets are posted every day – 2 million + articles are posted daily – The average article length at the New York Times is about 1,200 words. • Automatic summarization can prove extremely useful in attempting to generate insights and themes from such large corpuses of data.
  • 3. Образец заголовка What is Automatic Summarization? • Definition: Employing an algorithm to distill a text corpus to a considerably smaller body of important ideas, sentences, phrases, etc. • Key Challenges: – Determining how to rank the importance of a sentence, word, or phrase. – Eliminating and capitalizing on redundancy – Incorporating sentiment into the summary – Ensuring the summary is readable – Avoiding a search through an exponential solution space
  • 4. Образец заголовка Types of Automatic Summarization • Extractive Summarization – Select a subset of existing words, phrases, or sentences in the original text in order to generate a summary. • Abstractive Summarization – Aims to create a summary that is closer to what a human might generate. – Phrases in the summary don’t necessarily need to have appeared in the original text • Keyword/Key-phrase extraction
  • 5. Образец заголовкаApplications and Use Cases • Summarizing tweets in order to determine the timeline of a sports game • Product owners summarizing highly redundant product reviews (in order to get popular opinions and key insights) • Distilling a long article to a set of key points.
  • 6. Образец заголовка Summarization of Small Data (i.e. Tweets, microblogs, etc.) • These are summarization methods that are targeted at Tweets, microblogs, and other content-limited data. • A lot of work has been done recently in this area because of the increasing availability of large amounts of content-limited small data samples that are readily available via the advent of Twitter and similar sites that value smaller sized content.
  • 7. Образец заголовкаBasic Approaches • Topical Keyphrase Extraction from Twitter (Zhao, et al.) • Summarizing Microblogs Automatically (Sharifi, et al.)
  • 8. Образец заголовка Topical Keyphrase Extraction from Twitter (Xin Zhao, Jiang, He, Song, Achananuparp, Lim, Li) • The method proposed by the authors is a context-sensitive topical PageRank method for keyword ranking. • This is paired with a probabilistic scoring function that considers two factors of key phrases when doing key-phrase ranking: – relevance – interestingness Key idea: Generate a list of topical key phrases that will serve as a summarization of a corpus of tweets.
  • 9. Образец заголовка Summarizing Microblogs Automatically (Sharifi, Hutton, Kalita) • Start with a topic or phrase and generate tweets that are related to that topic or phrase. • Isolate the longest sentence in each tweet that contains the topic phrase. We use this set of sentences as the input to our algorithm. • Build a graph representing the common sequences of words (the common phrases) that occur before and after the key topic phrase. – This root node represents the topic phrase. – Each word is represented by a node and a count that indicates how many times the word occurs within the set of input sequences. A phrase is represented in the graph by a sequence of nodes starting with the root. Key Idea: take a trending phrase, collect a huge number of tweets containing that trending phrase, and provide an automatically generated summary of the tweets that were collected.
  • 10. Образец заголовка Summarizing Microblogs Automatically (Sharifi, Hutton, Kalita) • Assign each node to a weight. This is in order to prevent longer phrases from dominating the output. – Words are given weights that are proportional to their count. • Construct “partial summary” by searching the graph for a path with the largest total weight (it searches all paths that begin with the root node and end with a non-root node). – This path represents the most common phrase occurring either before or after the topic phrase. • Run algorithm once more. This time, we need to initialize the root node with the partial summary and rebuild the graph. This time around, the most heavily weighted path from the new graph is the final summary produced by the algorithm
  • 11. Образец заголовка Why are these approaches suboptimal? • Topical Key phrase Extraction from Twitter – Key phrase extraction can often produce noisy results (i.e. key phrases that are common but don’t help to identify themes). – Moreover, oftentimes, it may not be sufficient for summative purposes to simply look through a list of key phrases. • More detail may be required for a sufficient grasp of the original text. • Summarizing Microblogs Automatically – This approach is predicated on the fact that we specifically retrieve tweets that all pertain to the same trending phrase as an input to our algorithm. – Extracts only the longest sentence that contains the topic keyword(s) from each tweet as an input to the graph algorithm. • The equation of length of a sentence to importance could prove fallacious. Furthermore, this could mean discarding valuable
  • 12. Образец заголовка What are some more advanced approaches? • Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality (YaJuan, et al.) • Summarizing Sporting Events Using Twitter (Nichols, et al.) • Sumblr: continuous summarization of evolving tweet streams (Shou, et al.)
  • 13. Образец заголовка Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality (YaJuan, ZhuMin, FuRu, Ming, Heung − Yeung) • This approach takes advantage of follower-followee relationships on Twitter -- which is the main manner in which social influence of users is inferred. The quality of tweets is judged based a few factors that are incorporated into the graph-based ranking algorithm: – readability – content richness – a measure of the regularity of written language – pointless degree of the content. • In order to curb redundancy within the final summary, the model selects tweets from the ranking results using a Maximal Marginal Relevance algorithm (Carbonell and Goldstein, 1998). Key idea: Algorithm models and formulates the ranking of tweets in a unified mutual reinforcement graph.
  • 14. Образец заголовка Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality (YaJuan, ZhuMin, FuRu, Ming, Heung − Yeung) • Algorithm models the problem of tweet ranking in a unified mutual reinforcement graph. – In this model, social influence of users and a measure of the quality of the tweet content are both taken into consideration (in a simultaneously mutually reinforcing manner).
  • 15. Образец заголовка Summarizing Sporting Events Using Twitter (Nichols, Mahmud, Drews) • Takes advantage of the fact that throughout the course of sports games, viewers generally tend to make Twitter updates expressing opinions about different events that occur throughout the game. • Aims to generate a natural summary of the event that incorporates temporal cues, such as spikes in the volume of status updates, in order to identify important moments throughout the course of the game. • Aims to implement a sentence ranking method that is used to extract relevant sentences from the tweet corpus -- each presumably referring to an important moment in the game. Key idea: Summarize sporting events from a live corpus of tweets.
  • 16. Образец заголовка Sumblr: continuous summarization of evolving tweet streams (Shou, Wang, Ke Chen, Gang Chen) • Traditional automatic summarization methods for text documents primarily focus on static and small-scale data. • Sumblr (SUMmarization By stream cLusteRing) aims to summarize tweet streams -- thereby providing a dynamic summarization framework. Key idea: Timeline-based framework for topic summarization for tweets. Algorithm ranks and selects a diverse crop of important tweets within a bunch of different sub-topic groups. These tweets serve as the basis of the summary that will be composed for each sub-topic.
  • 17. Образец заголовка Sumblr: continuous summarization of evolving tweet streams (Shou, Wang, Ke Chen, Gang Chen) • During tweet stream clustering, it is necessary to maintain statistics for tweets to facilitate summary generation. For this reason, the authors of the paper introduce a representation called “tweet cluster vector”(TCV). • The Sumblr framework operates as follows: – At the start of the stream, we collect a small number of tweets and use a k-means clustering algorithm to create the initial clusters. The corresponding TCVs are initialized. – Incrementally update the TCVs whenever a new tweet arrives. At various points it time, the algorithm has to decide where to create a new centroid, add a tweet to an existing centroid, or merge/delete existing clusters. – High-level summarization step produces online and historical summaries
  • 18. Образец заголовка Comparison of Microblog Methods Paper Pros Cons Twitter Topic Summarization by Ranking Tweets Using Social Influence and Content Quality Takes advantage of social influence of authors when ranking tweets; takes readability, content richness, a measure of the regularity of written language, and how pointless the content is into account. Algorithm structure could thwart the summarization of more niche topics that have sparse follower-followee adjacency matrices. Summarizing Sporting Events Using Twitter Incorporates temporal cues; deals with live corpus Does not use valuable social metadata to rank tweets Topical Keyphrase Extraction from Twitter Considers both relevance and interestingness Key phrase extraction may not be as helpful as full sentence summarization for some use cases – especially for data that exhibits low topical phrase redundancies. Sumblr: continuous summarization of evolving tweet streams Provides a streaming summarization (as well as historical summaries) Implementation is more complicated Summarizing Microblogs Automatically Graph algorithm’s relevance calculation does not let long sentences have an unfair advantage over shorter sentences with just as much important Extracts only the longest sentence that contains the topic keyword(s) from each tweet as an input to the graph algorithm.
  • 19. Образец заголовкаSummarization of Larger Data • The following include summarization methods that can be applied to larger data as well. This includes reviews, documents, news articles, and so forth.
  • 20. Образец заголовкаBasic Approach • Extraction based approach for text summarization using k-means clustering (Agrawal , et al.)
  • 21. Образец заголовка Extraction based approach for text summarization using k-means clustering (Agrawal , Gupta) • At a high level, the algorithm proposed by the authors of this paper is an unsupervised learning approach that can be broken down into three steps: – tokenization of the document – computing a score for each sentence – clustering the sentences using k-means – extracting important sentences – and combining those sentences in order to form a summary. Key idea: incorporates k-means clustering, TF-IDF, and tokenization in order to perform extractive text summarization.
  • 22. Образец заголовкаWhat makes this approach suboptimal? • Does not take advantage of redundancy to rank importance. • Method for extraction of important sentence(s) from each centroid is naïve and can be gamed.
  • 23. Образец заголовкаMore Advanced Approaches • Product review summarization from a deeper perspective (Ly, et al.) • Mining and Summarizing Customer Reviews (Hu, et al.) • Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions (Ganesan, et al.) • Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions (Ganesan, et al.)
  • 24. Образец заголовка Product review summarization from a deeper perspective (Ly, Sugiyama, Lin, Kan) • The first step is Product Facet Identification. – In order to identify candidate facets, we need to preprocess the input reviews. • This involves tagging part-of-speech, stemming, assigning syntactic rules, and stop word removal. – We then deploy the Stanford Dependency Parser in order to detect the role of each noun. • We want to discard nouns that aren’t subjects or objects. • We then use association rule mining to identify frequent product facets. Key idea: algorithm automatically summarizes a massive collection of product reviews and generates a concise, non-redundant summary. Not only does this system extract review sentiments but it also extracts the underlying justifications behind the review sentiments.
  • 25. Образец заголовка Product review summarization from a deeper perspective (Ly, Sugiyama, Lin, Kan) • The second step is summarization. – For each of the facets mined in the previous step, we want to associate it with relevant opinion sentences that match the appropriate polarity expressed by the majority of the opinions in the reference text. – We first restrict our algorithm to run only on opinionated sentences from the reviews. • Furthermore, we perform sentiment analysis on the sentences to assign a polarity score to each sentence (the sum of the polarity of each word in a sentence). – We then calculate content-based pairwise similarities between all of the resultant opinion sentences. Using these scores, we perform clustering on the sentences. – The final task is to select the most representative sentence from each centroid for the final summary.
  • 26. Образец заголовка Mining and Summarizing Customer Reviews (Hu, Minqing, Liu) • The paper focuses on the problem of feature-based summaries of customer reviews of products sold online. In this context, “features” refers to product attributes. • Given a customer review corpus that pertains to a given product, summarization is split into three subtasks: – First, we must identify the product features that customers are speaking about. – Second, for each feature, we have to identify sentences in the reviews that have positive or negative opinions. – Last, we must produce a summary that aggregates all of Key ideas: Algorithm assists merchants in extracting the main ideas and themes from hundreds, if not thousands, of customer reviews through product feature extraction and consideration of sentence sentiment
  • 27. Образец заголовка Mining and Summarizing Customer Reviews (Hu, Minqing, Liu) Summarization system architecture
  • 28. Образец заголовка Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions (Ganesan, Zhai, Viega) Key ideas: greedy approach that heuristically prunes the exponential solution space so that we only have to deal with promising candidates. Ultimate goal is to generate a compact and informative summary using a set of micro-opinions. • Micro-opinion: 2-5 word phrase • Formal problem set-up: – Suppose we have a set of sentences Z =zi where i ∈[1,k] from an opinion document. – Goal is to generate a micro-opinion summary, M =m where I ∈ [1,k] where |mi| ∈ [2,5] and each mi conveys a key opinion from Z. – It is quite important to note that while we require that mi use words that occur at least once in the set Z, we do not require mi to be an exact subsequence of any of the sentences in Z. • This, this makes this set-up more of an abstractive summarization problem rather than an extractive summarization problem.
  • 29. Образец заголовка Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions (Ganesan, Zhai, Viega) • Algorithm: – Start with a set of high frequency unigrams from the original corpus. – Then, start to merge these unigrams to generate higher order bigrams, trigrams, and n-grams. – At each merge step, we make sure that the candidate n- grams have reasonably high readability and representativeness scores. – The candidate generation process stops when an attempt to grow an existing candidate leads to low readability or representativeness scores. – The final step is to sort all the candidate n-grams based on their objective function values (i.e., sum of Srepresentativeness and Sreadability) and generate a micro-opinion summary M by gradually adding phrases with the highest scores to our summary until the accumulated summary length reaches the length threshold.
  • 30. Образец заголовка Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions (Ganesan, Zhai, Han) • The results of the evaluation studies show that when compared to the baseline extractive method, the Opinosis summaries are closer to human summaries. • The high level picture of the algorithm: generate an abstractive summary by repeatedly searching the Opinosis graph for sub- graphs that basically represent semantically valid and meaningful sentences that happen to have high redundancy scores. • It is important that these sentences have high redundancy scores because that means that they are representative of a major opinion. • The sentences that are represented by these sub-graphs can be combined to form an abstractive summary. Key idea: graph-based approach to automatic text summarization. The summarization framework generates concise abstractive summaries and capitalizes on the presence of large amounts of redundancy in the opinions.
  • 31. Образец заголовка Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions (Ganesan, Zhai, Han) • Opinosis constructs a graph that represents the original text. The paper isolates three properties of this graph that they exploit in order to explore and score various sub paths throughout the graph. These sub-paths are what help to generate the candidate abstractive summaries. – Properties: • Redundancy Capture: extremely redundant textual occurrences are naturally captured by sub-graphs. • Gapped Subsequence Capture: existing sentence structures create “lexical links”. These links then facilitate the discovery of new sentences. • Collapsible Structures: nodes that resemble hubs can potentially be collapsed
  • 32. Образец заголовкаComparison of Methods for Larger Data Paper Pros Cons Mining and Summarizing Customer Reviews Takes sentiment into consideration Algorithm is restricted to run only on opinionated sentences. This could discard potentially valuable text. Micropinion Generation: An Unsupervised Approach to Generating Ultra-Concise Summaries of Opinions Aims to capitalize on existing redundancy and maximize readability; abstractive summarization aims to mimic human summarization; prunes unpromising candidates Does not take sentiment into consideration; essentially provides key phrases – which may not be optimal for all use cases Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions Capitalizes on redundancy; emphasized readability; abstractive summarization mimics human summarization Does not take sentiment into consideration Extraction based approach for text summarization using k-means clustering Implementation is simple and straightforward. Does not take sentiment into consideration; doesn’t take advantage of redundancy to rank importance. Product review summarization from a deeper perspective Takes sentiment into consideration Sentiment consideration needs to be more sophisticated in order to account for complex English phrase structure (i.e. sentences “I am happy” and “I am not happy” should have an extremely high sentiment differential. This might not happen under this approach.)
  • 33. Образец заголовкаFuture Strides • Ideally, we want to push towards better abstractive summarization approaches. – We want to emulate human summarization as closely as possible • Applications of deep learning to automatic summarization • Highly visual automatic summarizations