SlideShare a Scribd company logo
Proceedings NIPS Workshop on Applications for Topic Models: Text and
Beyond: Dec 2009, Whistler, Canada




               Learning to Summarize using Coherence


                    Pradipto Das                                  Rohini Srihari
            Department of Computer Science                 Department of Computer Science
                 University at Buffalo                          University at Buffalo
                  Buffalo, NY 14260                              Buffalo, NY 14260
               pdas3@buffalo.edu                          rohini@cedar.buffalo.edu



                                                Abstract

            The focus of our paper is to attempt to define a generative probabilistic topic model
            for text summarization that aims at extracting a small subset of sentences from the
            corpus with respect to some given query. We theorize that in addition to a bag of
            words, a document can also be viewed in a different manner. Words in a sentence
            always carry syntactic and semantic information and often such information (for
            e.g., the grammatical and semantic role (GSR) of a word like subject, object, noun
            and verb concepts etc.) is carried across adjacent sentences to enhance coherence
            in different parts of a document. We define a topic model that models documents
            by factoring in the GSR transitions for coherence and for a particular query, we
            rank sentences by a product of thematical salience and coherence through GSR
            transitions.


   1   Introduction

   Automatic summarization is one of the oldest studied problems in IR and NLP and is still receiving
   prominent research focus. In this paper, we propose a new joint model of words and sentences for
   multi-document summarization that attempts to integrate the coherence as well as the latent themes
   of the documents.
   In the realm of computational linguistics, there has been a lot of work in Centering Theory including
   Grosz et. al. [3]. Their work specifies how discourse interpretation depends on interactions among
   speaker intentions, attentional state, and linguistic form. In our context, we could assume a subset
   of documents discussing a particular theme to be a discourse. Attentional state models the discourse
   participants’ focus of attention at any given point in the discourse. This focus of attention helps
   identify “centers” of utterances that relate different parts of local discourse segments meaningfully
   and according to [3], the “centers” are semantic objects, not words, phrases, or syntactic forms
   and centering theory helps formalize the constraints on the centers to maximize coherence. In our
   context, the GSRs approximate the centers.
   Essentially, then the propagation of these centers of utterances across utterances helps maintain the
   local coherence. It is important to note that this local coherence is responsible for the choice of
   words appearing across utterances in a particular discourse segment and helps reduce the inference
   load placed upon the hearer (or reader) to understand the foci of attention.


   2   Adapting Centering Theory for Summarization

   For building a statistical topic model that incorportes GSR transitions (henceforth GSRts) across ut-
   terances, we attributed words in a sentence with GSRs like subjects, objects, concepts from WordNet
   synset role assignments(wn), adjectives, VerbNet thematic role assignment(vn), adverbs and “other”
   (if the feature of the word doesn’t fall into the previous GSR categories). Further if a word in a


                                                     1
sentence is identified with 2 or more GSRs, only one GSR is chosen based on the left to right de-
scending priority of the categories mentioned. These features (GSRs) were extracted using the text
analytics engine Semantex (http://www.janyainc.com/) . Thus in a window of sentences, there are potentially
(G + 1)2 GSRts for a total of G GSRs with the additional GSR representing a null feature (denoted
by “−−”) as in the word is not found in the contextual sentence. We used anaphora resolution as
offered by Semantex to substitute pronouns with the referent nouns as a preprocessing step. If there
are TG valid GSRts in the corpus, then a sentence is represented as a vector over GSRt counts only
along with a binary vector over the word vocabulary. It must be emphasized that the GSRs are the
output of a separate natural language parsing system.
For further insight, we can construct a matrix consisting of sentences as rows and words as columns;
the entries in the matrix are filled up with a specific GSR for the word in the corresponding sentence
following GSR priorities (in case of multiple occurences of the same word in the same sentence
with different GSRs). Figure 1 shows a slice of such a matrix taken from the TAC2008 dataset
(http://www.nist.gov/tac/tracks/2008/index.html) which contains documents related to events concerning Christian
minorities in Iraq and their current status. Figure 1 suggests, as in [1], that dense columns of the
GSRs indicate potentially salient and coherent sentences (7 and 8 here) that present less inference
load. The words and the GSRs jointly identify the centers in an utterance.




Figure 1: (a) Left: Sentence IDs and the GSRs of the words in them (b) Right: The corresponding
sentences

Note that the count for the GSRt “wn→ −−” for sentenceID 8 is 3 from this snapshot. Inputs to the
model are document specific word ID counts and document-sentence specific GSRt ID counts.

3    The Proposed Method
To describe the document generation process under our proposed “Learning To Summarize” (hence-
forth LeToS), we assume that there are K latent topics and T topic-coupled GSRts associated with
each document; rt is the observed GSRt, wn is the observed word and sp is the observed sentence.
Denote θ k to be the expected number of GSRts per topic; π t to be the expected number of words
and sentences per topic-coupled GSRt in each document. Further denote, zt to be a K dimensional
indicator for θ, vp be the T dimensional indicator for π and yn is an indicator for the same topic-
coupled GSRt proportion as vp , each time a word wn is associated with a particular sentence sp . At
the parameter level, each topic is a multinomial βk over the vocabulary V of words and each topic
is also a multinomial ρk over the GSRts following the implicit relation of GSRts to words within
sentence windows. Each topic-coupled GSRt is also treated as a multinomial Ωt over the total num-
ber U of sentences in the corpus. δ(wn ∈ sp ) is the delta function which is 1 iff the nth word
belong to the pth sentence. The document generation process is shown in Fig. 3 and is explained as
a pseudocode in Fig. 2.
The model can be viewed as a generative process that first generates the GSRts and subsequently
generates the words that describe the GSRt and hence an utterance unit (a sentence in this model).
For each document, we first generate GSRts using a simple LDA model and then for each of the
Nd words, a GSRt is chosen and a word wn is drawn conditioned on the same factor that generated
the chosen GSRt. Instead of influencing the choice of the GSRt to be selected from an assumed
distribution (e.g. uniform or poisson) of the number of GSRts, the document specific topic-coupled
proportions are used. Finally the sentences are sampled from Ωt by choosing a GSRt proportion
that is coupled to the factor that generates rt through the constituent wn . In disjunction, π along
with vp , sp and Ω focus mainly on coherence among the coarser units - the sentences. However, the
influence of a particular GSRt like “subj→subj” on coherence may be discounted if that is not the


                                                       2
For each document d ∈ 1, ..., M
    Choose a topic proportion θ|α ∼ Dir(α)
            Choose topic indicator zt |θ ∼ M ult(θ)
            Choose a GSRt rt |zt = k, ρ ∼ M ult(ρzt )
    Choose a GSRt proportion π|η ∼ Dir(η)
    For each position n in document d:
        For each instance of utterance sp for which wn
                       occurs in sp in document d:
            Choose vp |π ∼ M ult(π)
            Choose yn ∼ vp δ(wn ∈ sp )
            Choose a sentence sp ∼ M ult(Ωvp )
            Choose a word wn |yn = t, z, β ∼ M ult(β zy )
                                                       n




    Figure 2:   Document generation process of LeToS              Figure 3:   Graphical model representation of LeToS



dominant trend in the transition topic. This fact is enforced through the coupling of empirical GSRt
proportions to topics of the sentential words.

3.1 Parameter Estimation and Inference

In this paper we have resorted to mean field variational inference [2] to find as tight as possible an
approximation to the log likelihood of the data (the joint distribution of the observed variables given
the parameters) by minimizing the KL divergence of approximate factorized mean field distribution
to the posterior distribution of the latent variables given the data. In the variational setting, for
                           K              T                    T
each document we have k=1 φtk = 1, t=1 λnt = 1 and t=1 ζpt = 1 and the approximating
distribution is factorized as:
                                                       T             N
                                                                                   P
       q(θ, π, z, y, v|γ, χ, φ, λ, ζ) = q(θ|γ)q(π|χ)       q(zt |φt )    q(wn |λn )     q(sp |ζp ) (1)
                                                                t=1           n=1                p=1


The variational functional to optimize can be shown to be
            F = Eq [log p(r, w, s|α, θ, η, π, ρ, β, Ω)] − Eq [log q(θ, π, z, y|γ, χ, φ, λ, ζ)]                          (2)
where Eq [f (.)] is the expectation of f (.) under the q distribution.
The maximum likelihood estimations of these indicator variables for the topics and the topic-coupled
GSRts are as follows:
                      Td                      Nd            Pd
          γi = αi + t=1 φti ;       χt = ηt + n=1 λnt + p=1 ζpt
                                 T               K
          λnt ∝ exp{(Ψ(χt ) − Ψ( f =1 χf )) + ( i=1 φti log βz(yn =t) =i,n )}
                                           K             Nd
          φti ∝ exp{log ρit + (Ψ(γi ) − Ψ( k=1 γk ))+( n=1 λnt log βz(yn =t) =i,n )}
                                     T
          ζpt ∝ Ωpt exp{Ψ(χt ) − Ψ( j=1 χj )}

We now write the expressions for the maximum likelihood of the parameters of the original graphical
model using derivatives w.r.t the parameters of the functional F in Equ. (2). We have the following
results:
                  M Td              g               M Nd Td                   j
            ρig ∝ d=1 t=1 φdti rdt ;          βij ∝ d=1 n=1 ( t=1 λnt φti )wdn ;
                   M Pd
            Ωtu ∝ d=1 p=1 ζdpt su      dp
        g
where rdt is 1 iff t = g and 0 otherwise with g as an index variable for all possible GSRts; u is an
index into one of the U sentences in the corpus and su = 1 if the pth sentence in document d is one
                                                     dp
among U . The updates of α and η are exactly the same as mentioned in [2].
For obtaining summaries, we order sentences w.r.t query words by computing the following:
                                       Q
                                        T K
                     p(sdp |wq ) ∝       (     ζdpt φdti (λdlt φdti )γdi χdt )δ(wl ∈ sdp )                              (3)
                                       l=1 t=1 i=1

where Q is the number of the query words and su is the uth sentence in the corpus that belongs to all
such document ds which are relevant to the query, wl is the lth query word. Further, the sentences


                                                            3
are scored over only “rich” GSRts which lack any “−− → −−” transitions whenever possible. We
also expand the query by a few words while summarizing in real time using topic inference on the
relevant set of documents.

4     Results and Discussions
Tables       1      and       2     show        some
                                              topics learnt from the TAC2009 dataset
                                           . From table 2, we observe that the topics under
(http://www.nist.gov/tac/2009/Summarization/index.html)
both models are the same qualitatively. Moreover, it has been observed that constraining LeToS to
words and GSRts as the only observed variables shows lower word perplexity than LDA on heldout
test data. Empirically, it has been seen that the time complexity for LeToS is sligthly higher than
LDA due to the extra iterations over the GSRts and sentences.
    topic16         topic36        topic38         topic22        topic58     topic1    topic42       topic28
    Kozlowski       bombings       solar           Hurricane      Kozlowski   Malik     solar         Hurricane
    million         Malik          energy          Rita           Tyco        bombs     energy        Rita
    Tyco            Sikhs          power           evacuations    million     India     power         evacuated
    company         Bagri          BP              Texas          company     Sikh      electricity   storms
    trial           India          company         Louisiana      loan        killing   systems       Texas
    Swartz          case           year            area           trial       Flight    government    Louisiana
    loans           killed         panel           state          Swartz      Bagri     production    area

 Table 1: Some topics under LDA for TAC2009                      Table 2: Some topics under LeToS for TAC2009
For TAC2009, using the more meaningful Pyramid [4] scoring for summaries, the average Pyramid
scores for very short 100 word summaries over 44 queries were obtained as 0.3024 for the A timeline
and 0.2601 for the B timeline for LeToS and ranked 13th and 9th of 52 submissions. The scores
for a state-of-the-art summarization system [5] that uses coherence to some extent and a baseline
returning all the leading sentences (up to 100 words) in the most recent document are (0.1756 and
0.1601) and (0.175 and 0.160) respectively for the A and B timelines. The score for the B timeline
is lower due to redundancy.

5      Conclusion
Overall, we have integrated centering theory based coherence into topic model. Models like LeToS
tend to capture “what is being discussed” by selecting sentences that have low reader “inference
load”. On the other hand, the model gets penalized if the summaries need to be very factual. This
could probably be avoided by defining finer GSR categories such as named entities. Another draw-
back of the model is its lack of understanding the meaning of the query. However, generating
specific summaries w.r.t. an information need using topic modeling is akin to answering natural
language questions. That problem is hard, albeit an open one under the topic modeling umbrella.

References
[1] Regina Barzilay and Mirella Lapata. Modeling local coherence: an entity-based approach. In
    ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguis-
    tics, pages 141–148. Association for Computational Linguistics, 2005.
[2] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of
    Machine Learning Research, 3:993–1022, 2003.
[3] Barbara J. Grosz, Scott Weinstein, and Arvind K. Joshi. Centering: A framework for modeling
    the local coherence of discourse. In Computational Linguistics, volume 21, pages 203–225,
    1995.
[4] Aaron Harnly, Ani Nenkova, Rebecca Passonneau, and Owen Rambow. Automation of sum-
    mary evaluation by the pyramid method. In Recent Advances in Natural Language Processing
    (RANLP), 2005.
[5] Rohini Srihari, Li Xu, and Tushar Saxena. Use of ranked cross document evidence trails for
    hypothesis generation. In Proceedings of the 13th International Conference on Knowledge
    Discovery and Data Mining (KDD), pages 677–686, San Jose, CA, 2007.


                                                             4

More Related Content

What's hot

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
Sihan Chen
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
IJERA Editor
 
Utterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent SummariesUtterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent Summaries
Content Savvy
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
University of Bari (Italy)
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
ijnlc
 
Distributional semantics
Distributional semanticsDistributional semantics
Distributional semantics
Rabindra Nath Nandi
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
Tomoyuki Kajiwara
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
Lukáš Svoboda
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
AIST
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
IDES Editor
 
An introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semanticsAn introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semantics
Andre Freitas
 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
ijaia
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
ijnlc
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
University of Minnesota, Duluth
 
ijcai11
ijcai11ijcai11

What's hot (20)

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
Utterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent SummariesUtterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent Summaries
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Distributional semantics
Distributional semanticsDistributional semantics
Distributional semantics
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
An introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semanticsAn introduction to compositional models in distributional semantics
An introduction to compositional models in distributional semantics
 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
 
Text summarization
Text summarizationText summarization
Text summarization
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
 
Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1Ihi2012 semantic-similarity-tutorial-part1
Ihi2012 semantic-similarity-tutorial-part1
 
ijcai11
ijcai11ijcai11
ijcai11
 

Viewers also liked

Task 10: Perseus' Utterances
Task 10: Perseus' UtterancesTask 10: Perseus' Utterances
Task 10: Perseus' Utterances
Sophia Marie Verdeflor
 
Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data ConferenceVisualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
sredmore
 
World Explorer (JCDL 2007 Best Paper)
World Explorer (JCDL 2007 Best Paper)World Explorer (JCDL 2007 Best Paper)
World Explorer (JCDL 2007 Best Paper)
rnair
 
Qualcomm Speech
Qualcomm SpeechQualcomm Speech
Qualcomm Speech
Ayokiitan (Ayo) Akala
 
Text as shape, text as meaning
Text as shape, text as meaningText as shape, text as meaning
Text as shape, text as meaning
sego
 
Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012
emonson
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
Content Savvy
 
Visualizing The Paragraph
Visualizing The ParagraphVisualizing The Paragraph
Visualizing The Paragraph
Robin Surland
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
Tony Russell-Rose
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
Andrea Gigli
 
Task 10: The Gorgon's Head Relationship of Characters
Task 10: The Gorgon's Head Relationship of CharactersTask 10: The Gorgon's Head Relationship of Characters
Task 10: The Gorgon's Head Relationship of Characters
Sophia Marie Verdeflor
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
Rohit Kumar
 

Viewers also liked (12)

Task 10: Perseus' Utterances
Task 10: Perseus' UtterancesTask 10: Perseus' Utterances
Task 10: Perseus' Utterances
 
Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data ConferenceVisualizing Text: Seth Redmore at the 2015 Smart Data Conference
Visualizing Text: Seth Redmore at the 2015 Smart Data Conference
 
World Explorer (JCDL 2007 Best Paper)
World Explorer (JCDL 2007 Best Paper)World Explorer (JCDL 2007 Best Paper)
World Explorer (JCDL 2007 Best Paper)
 
Qualcomm Speech
Qualcomm SpeechQualcomm Speech
Qualcomm Speech
 
Text as shape, text as meaning
Text as shape, text as meaningText as shape, text as meaning
Text as shape, text as meaning
 
Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 
Visualizing The Paragraph
Visualizing The ParagraphVisualizing The Paragraph
Visualizing The Paragraph
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Task 10: The Gorgon's Head Relationship of Characters
Task 10: The Gorgon's Head Relationship of CharactersTask 10: The Gorgon's Head Relationship of Characters
Task 10: The Gorgon's Head Relationship of Characters
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 

Similar to Learning to summarize using coherence

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
irjes
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
Soojung Hong
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
ijnlc
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
kevig
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
kevig
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
kevig
 
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGETEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
kevig
 
Text Summarization in Mongolian Language
Text Summarization in Mongolian LanguageText Summarization in Mongolian Language
Text Summarization in Mongolian Language
kevig
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Fulvio Rotella
 
FinalDraftRevisisions
FinalDraftRevisisionsFinalDraftRevisisions
FinalDraftRevisisions
Joshua StGeorge
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
kevig
 
Sentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptxSentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptx
E&S Education Department, KP
 
Topic models
Topic modelsTopic models
Topic models
Ajay Ohri
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
Findwise
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
paperpublications3
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
Rama Irsheidat
 

Similar to Learning to summarize using coherence (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
 
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGETEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE
 
Text Summarization in Mongolian Language
Text Summarization in Mongolian LanguageText Summarization in Mongolian Language
Text Summarization in Mongolian Language
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
FinalDraftRevisisions
FinalDraftRevisisionsFinalDraftRevisisions
FinalDraftRevisisions
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
 
Sentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptxSentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptx
 
Topic models
Topic modelsTopic models
Topic models
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
 

Recently uploaded

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 

Recently uploaded (20)

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 

Learning to summarize using coherence

  • 1. Proceedings NIPS Workshop on Applications for Topic Models: Text and Beyond: Dec 2009, Whistler, Canada Learning to Summarize using Coherence Pradipto Das Rohini Srihari Department of Computer Science Department of Computer Science University at Buffalo University at Buffalo Buffalo, NY 14260 Buffalo, NY 14260 pdas3@buffalo.edu rohini@cedar.buffalo.edu Abstract The focus of our paper is to attempt to define a generative probabilistic topic model for text summarization that aims at extracting a small subset of sentences from the corpus with respect to some given query. We theorize that in addition to a bag of words, a document can also be viewed in a different manner. Words in a sentence always carry syntactic and semantic information and often such information (for e.g., the grammatical and semantic role (GSR) of a word like subject, object, noun and verb concepts etc.) is carried across adjacent sentences to enhance coherence in different parts of a document. We define a topic model that models documents by factoring in the GSR transitions for coherence and for a particular query, we rank sentences by a product of thematical salience and coherence through GSR transitions. 1 Introduction Automatic summarization is one of the oldest studied problems in IR and NLP and is still receiving prominent research focus. In this paper, we propose a new joint model of words and sentences for multi-document summarization that attempts to integrate the coherence as well as the latent themes of the documents. In the realm of computational linguistics, there has been a lot of work in Centering Theory including Grosz et. al. [3]. Their work specifies how discourse interpretation depends on interactions among speaker intentions, attentional state, and linguistic form. In our context, we could assume a subset of documents discussing a particular theme to be a discourse. Attentional state models the discourse participants’ focus of attention at any given point in the discourse. This focus of attention helps identify “centers” of utterances that relate different parts of local discourse segments meaningfully and according to [3], the “centers” are semantic objects, not words, phrases, or syntactic forms and centering theory helps formalize the constraints on the centers to maximize coherence. In our context, the GSRs approximate the centers. Essentially, then the propagation of these centers of utterances across utterances helps maintain the local coherence. It is important to note that this local coherence is responsible for the choice of words appearing across utterances in a particular discourse segment and helps reduce the inference load placed upon the hearer (or reader) to understand the foci of attention. 2 Adapting Centering Theory for Summarization For building a statistical topic model that incorportes GSR transitions (henceforth GSRts) across ut- terances, we attributed words in a sentence with GSRs like subjects, objects, concepts from WordNet synset role assignments(wn), adjectives, VerbNet thematic role assignment(vn), adverbs and “other” (if the feature of the word doesn’t fall into the previous GSR categories). Further if a word in a 1
  • 2. sentence is identified with 2 or more GSRs, only one GSR is chosen based on the left to right de- scending priority of the categories mentioned. These features (GSRs) were extracted using the text analytics engine Semantex (http://www.janyainc.com/) . Thus in a window of sentences, there are potentially (G + 1)2 GSRts for a total of G GSRs with the additional GSR representing a null feature (denoted by “−−”) as in the word is not found in the contextual sentence. We used anaphora resolution as offered by Semantex to substitute pronouns with the referent nouns as a preprocessing step. If there are TG valid GSRts in the corpus, then a sentence is represented as a vector over GSRt counts only along with a binary vector over the word vocabulary. It must be emphasized that the GSRs are the output of a separate natural language parsing system. For further insight, we can construct a matrix consisting of sentences as rows and words as columns; the entries in the matrix are filled up with a specific GSR for the word in the corresponding sentence following GSR priorities (in case of multiple occurences of the same word in the same sentence with different GSRs). Figure 1 shows a slice of such a matrix taken from the TAC2008 dataset (http://www.nist.gov/tac/tracks/2008/index.html) which contains documents related to events concerning Christian minorities in Iraq and their current status. Figure 1 suggests, as in [1], that dense columns of the GSRs indicate potentially salient and coherent sentences (7 and 8 here) that present less inference load. The words and the GSRs jointly identify the centers in an utterance. Figure 1: (a) Left: Sentence IDs and the GSRs of the words in them (b) Right: The corresponding sentences Note that the count for the GSRt “wn→ −−” for sentenceID 8 is 3 from this snapshot. Inputs to the model are document specific word ID counts and document-sentence specific GSRt ID counts. 3 The Proposed Method To describe the document generation process under our proposed “Learning To Summarize” (hence- forth LeToS), we assume that there are K latent topics and T topic-coupled GSRts associated with each document; rt is the observed GSRt, wn is the observed word and sp is the observed sentence. Denote θ k to be the expected number of GSRts per topic; π t to be the expected number of words and sentences per topic-coupled GSRt in each document. Further denote, zt to be a K dimensional indicator for θ, vp be the T dimensional indicator for π and yn is an indicator for the same topic- coupled GSRt proportion as vp , each time a word wn is associated with a particular sentence sp . At the parameter level, each topic is a multinomial βk over the vocabulary V of words and each topic is also a multinomial ρk over the GSRts following the implicit relation of GSRts to words within sentence windows. Each topic-coupled GSRt is also treated as a multinomial Ωt over the total num- ber U of sentences in the corpus. δ(wn ∈ sp ) is the delta function which is 1 iff the nth word belong to the pth sentence. The document generation process is shown in Fig. 3 and is explained as a pseudocode in Fig. 2. The model can be viewed as a generative process that first generates the GSRts and subsequently generates the words that describe the GSRt and hence an utterance unit (a sentence in this model). For each document, we first generate GSRts using a simple LDA model and then for each of the Nd words, a GSRt is chosen and a word wn is drawn conditioned on the same factor that generated the chosen GSRt. Instead of influencing the choice of the GSRt to be selected from an assumed distribution (e.g. uniform or poisson) of the number of GSRts, the document specific topic-coupled proportions are used. Finally the sentences are sampled from Ωt by choosing a GSRt proportion that is coupled to the factor that generates rt through the constituent wn . In disjunction, π along with vp , sp and Ω focus mainly on coherence among the coarser units - the sentences. However, the influence of a particular GSRt like “subj→subj” on coherence may be discounted if that is not the 2
  • 3. For each document d ∈ 1, ..., M Choose a topic proportion θ|α ∼ Dir(α) Choose topic indicator zt |θ ∼ M ult(θ) Choose a GSRt rt |zt = k, ρ ∼ M ult(ρzt ) Choose a GSRt proportion π|η ∼ Dir(η) For each position n in document d: For each instance of utterance sp for which wn occurs in sp in document d: Choose vp |π ∼ M ult(π) Choose yn ∼ vp δ(wn ∈ sp ) Choose a sentence sp ∼ M ult(Ωvp ) Choose a word wn |yn = t, z, β ∼ M ult(β zy ) n Figure 2: Document generation process of LeToS Figure 3: Graphical model representation of LeToS dominant trend in the transition topic. This fact is enforced through the coupling of empirical GSRt proportions to topics of the sentential words. 3.1 Parameter Estimation and Inference In this paper we have resorted to mean field variational inference [2] to find as tight as possible an approximation to the log likelihood of the data (the joint distribution of the observed variables given the parameters) by minimizing the KL divergence of approximate factorized mean field distribution to the posterior distribution of the latent variables given the data. In the variational setting, for K T T each document we have k=1 φtk = 1, t=1 λnt = 1 and t=1 ζpt = 1 and the approximating distribution is factorized as: T N P q(θ, π, z, y, v|γ, χ, φ, λ, ζ) = q(θ|γ)q(π|χ) q(zt |φt ) q(wn |λn ) q(sp |ζp ) (1) t=1 n=1 p=1 The variational functional to optimize can be shown to be F = Eq [log p(r, w, s|α, θ, η, π, ρ, β, Ω)] − Eq [log q(θ, π, z, y|γ, χ, φ, λ, ζ)] (2) where Eq [f (.)] is the expectation of f (.) under the q distribution. The maximum likelihood estimations of these indicator variables for the topics and the topic-coupled GSRts are as follows: Td Nd Pd γi = αi + t=1 φti ; χt = ηt + n=1 λnt + p=1 ζpt T K λnt ∝ exp{(Ψ(χt ) − Ψ( f =1 χf )) + ( i=1 φti log βz(yn =t) =i,n )} K Nd φti ∝ exp{log ρit + (Ψ(γi ) − Ψ( k=1 γk ))+( n=1 λnt log βz(yn =t) =i,n )} T ζpt ∝ Ωpt exp{Ψ(χt ) − Ψ( j=1 χj )} We now write the expressions for the maximum likelihood of the parameters of the original graphical model using derivatives w.r.t the parameters of the functional F in Equ. (2). We have the following results: M Td g M Nd Td j ρig ∝ d=1 t=1 φdti rdt ; βij ∝ d=1 n=1 ( t=1 λnt φti )wdn ; M Pd Ωtu ∝ d=1 p=1 ζdpt su dp g where rdt is 1 iff t = g and 0 otherwise with g as an index variable for all possible GSRts; u is an index into one of the U sentences in the corpus and su = 1 if the pth sentence in document d is one dp among U . The updates of α and η are exactly the same as mentioned in [2]. For obtaining summaries, we order sentences w.r.t query words by computing the following: Q T K p(sdp |wq ) ∝ ( ζdpt φdti (λdlt φdti )γdi χdt )δ(wl ∈ sdp ) (3) l=1 t=1 i=1 where Q is the number of the query words and su is the uth sentence in the corpus that belongs to all such document ds which are relevant to the query, wl is the lth query word. Further, the sentences 3
  • 4. are scored over only “rich” GSRts which lack any “−− → −−” transitions whenever possible. We also expand the query by a few words while summarizing in real time using topic inference on the relevant set of documents. 4 Results and Discussions Tables 1 and 2 show some topics learnt from the TAC2009 dataset . From table 2, we observe that the topics under (http://www.nist.gov/tac/2009/Summarization/index.html) both models are the same qualitatively. Moreover, it has been observed that constraining LeToS to words and GSRts as the only observed variables shows lower word perplexity than LDA on heldout test data. Empirically, it has been seen that the time complexity for LeToS is sligthly higher than LDA due to the extra iterations over the GSRts and sentences. topic16 topic36 topic38 topic22 topic58 topic1 topic42 topic28 Kozlowski bombings solar Hurricane Kozlowski Malik solar Hurricane million Malik energy Rita Tyco bombs energy Rita Tyco Sikhs power evacuations million India power evacuated company Bagri BP Texas company Sikh electricity storms trial India company Louisiana loan killing systems Texas Swartz case year area trial Flight government Louisiana loans killed panel state Swartz Bagri production area Table 1: Some topics under LDA for TAC2009 Table 2: Some topics under LeToS for TAC2009 For TAC2009, using the more meaningful Pyramid [4] scoring for summaries, the average Pyramid scores for very short 100 word summaries over 44 queries were obtained as 0.3024 for the A timeline and 0.2601 for the B timeline for LeToS and ranked 13th and 9th of 52 submissions. The scores for a state-of-the-art summarization system [5] that uses coherence to some extent and a baseline returning all the leading sentences (up to 100 words) in the most recent document are (0.1756 and 0.1601) and (0.175 and 0.160) respectively for the A and B timelines. The score for the B timeline is lower due to redundancy. 5 Conclusion Overall, we have integrated centering theory based coherence into topic model. Models like LeToS tend to capture “what is being discussed” by selecting sentences that have low reader “inference load”. On the other hand, the model gets penalized if the summaries need to be very factual. This could probably be avoided by defining finer GSR categories such as named entities. Another draw- back of the model is its lack of understanding the meaning of the query. However, generating specific summaries w.r.t. an information need using topic modeling is akin to answering natural language questions. That problem is hard, albeit an open one under the topic modeling umbrella. References [1] Regina Barzilay and Mirella Lapata. Modeling local coherence: an entity-based approach. In ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguis- tics, pages 141–148. Association for Computational Linguistics, 2005. [2] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. [3] Barbara J. Grosz, Scott Weinstein, and Arvind K. Joshi. Centering: A framework for modeling the local coherence of discourse. In Computational Linguistics, volume 21, pages 203–225, 1995. [4] Aaron Harnly, Ani Nenkova, Rebecca Passonneau, and Owen Rambow. Automation of sum- mary evaluation by the pyramid method. In Recent Advances in Natural Language Processing (RANLP), 2005. [5] Rohini Srihari, Li Xu, and Tushar Saxena. Use of ranked cross document evidence trails for hypothesis generation. In Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining (KDD), pages 677–686, San Jose, CA, 2007. 4