Barzilay & Lapata 2008 presentation

1,610 views

Published on

Barzilay & Lapata 2008 'Modeling Local Coherence: An Entity-Based Approach' presentation for Discourse Parsing and Language Technology seminar.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,610
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Computed from a standard dependency Parser
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Does worse on out of domain training.
  • Barzilay & Lapata 2008 presentation

    1. 1. Modeling LocalCoherence: An Entity-Based ApproachRegina Barzilay (MIT), Mirella Lapata (UoE)ACL 2008
    2. 2. AbstractThis article proposes a novel framework forrepresenting and measuring local coherence.Central to this approach is the entity-gridrepresentation of discourse, which captures patternsof entity distribution in a text. The algorithmintroduced in the article automatically abstracts a textinto a set of entity transition sequences and recordsdistributional, syntactic, and referential informationabout discourse entities. We re-conceptualizecoherence assessment as a learning task and showthat our entity-based representation is well-suited forranking-based generation and text classificationtasks. Using the proposed representation, we achievegood performance on text ordering, summarycoherence evaluation, and readability assessment.
    3. 3. IntroductionA key requirement for any system that produces textis the coherence of its output.Use of coherence theories: text generation, especiallyindistinguishable from human writingPrevious efforts have relied on handcrafted rules,valid only for limited domains, with no guarantee ofscalability or portability (Reiter and Dale 2000).Furthermore, coherence constraints are oftenembedded in complex representations (e.g., Asherand Lascarides 2003) which are hard to implement ina robust application.
    4. 4. IntroductionHere, the focus is on local coherence (sentenceto sentence). Necessary for global coherence,too.The key premise of our work is that thedistribution of entities in locally coherent textsexhibits certain regularities.Covered before in Centering Theory (Grosz,Joshi, and Weinstein 1995) and other entity-based theories of discourse (e.g., Givon 1987;Prince 1981).
    5. 5. IntroductionThe proposed entity-based representation ofdiscourse allows us to learn the properties ofcoherent texts from a corpus, without recourse tomanual annotation or a predefined knowledgebase.Usefulness tests: text ordering, automaticevaluation of summary coherence, andreadability assessment.Lapata formulates text ordering and summaryevaluation—as ranking problems, wit a learningmodel.
    6. 6. IntroductionEvaluation: In the text-ordering task our algorithm has to select a maximally coherent sentence order from a set of candidate permutations. In the summary evaluation task, we compare the rankings produced by the model against human coherence judgments elicited for automatically generated summaries. In both experiments, our method yields improvements over state-of-the-art models.
    7. 7. IntroductionEvaluation: By incorporating coherence features stemming from the proposed entity-based representation, we improve the performance of a state-of-the-art readability assessment system
    8. 8. Outline2. Related Work3. The Coherence model4. Experiment 1: Sentence Ordering5. Experiment 2: Summary Coherence Rating6. Experiment 3: Readabiality Assessment7. Discussion and Conclusions
    9. 9. Related Work2. Related Work 1. Summary of entity-based theories of discourse, and overview previous attempts for translating their underlying principles into computational coherence models. 2. Description of ranking approaches to natural language generation and focus on coherence metrics used in current text planners.
    10. 10. Related Work2.1 Entity-Based Approaches to LocalCoherenceEntity-based accounts of local coherencecommonUnifying assumption: discourse coherence isachieved in view of the way discourse entitiesare introduced and discussed.Commonly formalized by devising constraints onthe linguistic realization and distribution ofdiscourse entities in coherent texts.
    11. 11. Related WorkCentering theory: salience concerns how entitiesare realized in an utteranceElse: salience is defined in terms of topicality(Chafe 1976; Prince 1978), predictability (Kuno1972; Halliday and Hasan 1976), and cognitiveaccessibility (Gundel, Hedberg, and Zacharski1993)
    12. 12. Related WorkEntity-based theories: capture coherence bycharacterizing the distribution of entities acrossdiscourse utterances, distinguishing betweensalient entities and the rest.The intuition here is that texts about the samediscourse entity are perceived to be morecoherent than texts fraught with abrupt switchesfrom one topic to the next.
    13. 13. Related WorkHard to model coherence computationally (oftenas the underlying theories are not fleshed out)Often use manual annotations as bootstrappersfor algorithms.
    14. 14. Related WorkB & L: Not based on any particular theory Inference model combines relevant information (not manual annotations) Emphasizes automatic computation for both the underlying discourse representation and the inference procedure Automatic, albeit noisy, feature extraction allows performing a large scale evaluation of differently instantiated coherence models across genres and applications.
    15. 15. Related Work2.2 Ranking Approaches Produce a large set of candidate outputs, rank them based on desired features using a ranking function. Two-stage generate-and-rank system minimizes complexity. In re: coherence, text planning is important for coherent output. Same iterated ranking system for text plans. Feature selection & weighting done manually – not sufficient. ―The problem is far too complex and our knowledge of the issues involved so meager that only a token gesture can be made at this point.‖ (Mellish et al. 1998, p.100)
    16. 16. Related WorkB&L: Introduce an entity-based representation of discourse that is automatically computed from raw text; The representation reveals entity transition patterns characteristic of coherent texts. This can be easily translated into a large feature space which lends itself naturally to the effective learning of a ranking function, without explicit manual involvement.
    17. 17. The Coherence Model3.1 Entity-Grid Discourse Representation Each text is represented by an entity grid, a two- dimensional array that captures the distribution of discourse entities across text sentences. The rows of the grid correspond to sentences, and the columns correspond to discourse entities. By discourse entity we mean a class of coreferent noun phrases. Each grid cell thus corresponds to a string from a set of categories reflecting whether the entity in question is a subject (S), object (O), or neither (X)
    18. 18. The Coherence Model
    19. 19. The Coherence Model3.2 Entity Grids as Feature Vectors Assumption: the distribution of entities in coherent texts exhibits certain regularities reflected in grid topology. One would further expect that entities corresponding to dense columns are more often subjects or objects (for instance.)
    20. 20. The Coherence ModelAnalysis revolves around local entity transition: A sequence {S, O, X, –}n that represents entity occurrences and their syntactic roles in n adjacent sentences. (And their probability in the text). Each text is represented by a fixed set of transition sequences using a standard feature vector notation – which can be used for: Learning algorithms Identifying information relevant to coherence assessment
    21. 21. The Coherence Model3.3 Grid Construction: Linguistic Dimensions What linguistic information is relevant for coherence prediction? How should we represent those? What should the parameters be for a good computational, automatic model?
    22. 22. The Coherence ModelParameters: Exploration of the parameter space guided by: Linguistic importance of parameter (linked to local coherence) Accuracy of automatic computation (granularity, etc.) Size of the resulting feature space (too big is not good.)
    23. 23. The Coherence ModelEntity extraction: Co-reference resolution system (Ng & Cardie 2002) (Various lexical, grammatical, semantic, positional features) For different domains/languages: simply cluster nouns based on identify. Works consistently.Grammatical function: Collins’ 1997 Parser
    24. 24. The Coherence ModelSalience: Evaluate by using two models: one with uniform treatment, one that discriminates between transitions of salient entities and the rest. Frequency counts. Compute each salient group’s transitions separately, then combine then into a single feature vector.
    25. 25. The Coherence ModelWith feature vector representation, coherenceassessment becomes a machine learningproblem.By encoding texts as entity transition sequences,the algorithm can learn a ranking function(instead of manually specifying it.)The feature vector representation can also beused for conventional classification tasks (apartfrom information ordering and summarycoherence rating).
    26. 26. Sentence OrderingA document is a bag of sentences, and thealgorithm task is to find the maximal coherentorder.Again, the algorithm is used here to rankalternative sentence orderings, but not to find theoptimal one. (Local coherence is not enough forthis.)
    27. 27. Sentence Ordering3.1 Modeling Training set: ordered pairs of alternate readings (xij,xik) Document di, j > k Goal is to find parameter vector w such that yields a ranking score function which minimises violations or pairwise rankings in training set: ∀(xij,xik)∈r∗ : w · Φ(xij) > w · Φ(xik) r* = optimal ranking Φ(xij) and Φ(xik) are a mapping onto features representing the coherence properties of renderings xij and xik
    28. 28. Sentence Ordering3.1 Modeling Ideal ranking function, represented by the weight vector w would satisfy the condition: w · (Φ(xij) − Φ(xik)) > 0 ∀j, i, k such that j > k Total number of training and test instances in corpora: Earthquakes: Train 1,896, Test 2,056 Accidents: Train 2,095, Test 2,087
    29. 29. Sentence Ordering4.2 Method Data: To acquire a large collection for training and testing, B&L created synthetic data, wherein the candidate set consists of a source document and permutations of its sentences. AP press articles on earthquakes, National Transportation Safety Board’s aviation accident database 100 source articles, with up to 20 randomly generated permutations for training.
    30. 30. Sentence OrderingComparison with State-of-the-Art Methods Compared against Foltz, Kintsch, Landauer 1998 Barzilay and Lee 2004 Both rely mainly on lexical information, unlike here. FKL98: LSA coherence measure for semantic relatedness of adjacent sentences BL04: HMM, where states are topicsEvaluation Metric Accuracy = correct predictions / size of test. Random = 50% (Binary)
    31. 31. Sentence Ordering
    32. 32. Sentence OrderingComparison with SotA Methods: Outperforms LSA on both domains: Because of – Coreference + grammatical role information, more holistic representation (over more than 2 sentences), exposure to domain relevant texts. HMM comparable for Earthquakes corpora, not Accidents may be complementary
    33. 33. Sentence Ordering
    34. 34. Sentence Ordering
    35. 35. Summary Coherence RatingTested model-induced rankings against humanrankings.If successful, holds implications for automaticevaluation of machine-generated texts.Better than BLEU or ROUGE, which weren’tdesigned for this.
    36. 36. Summary Coherence Rating5.1 Modeling Summary coherence is also a ranking learning task. Same as before.
    37. 37. Summary Coherence Rating5.2 Data Evaluation based on Document Understanding Conference 2003, which has rated summaries. Not good enough, so randomly selected 16 input document clusters and five systems that produced summaries. Ratings collected by 177 internet slaves (unpaid volunteers). These were then checked by leave- one-out-resampling and discretization into two classes. Training: 144 summaries. Test: 80 pairwise ratings. Dev: 6 documents.
    38. 38. Summary Coherence RatingExperiment 1: co-reference resolution tool forhuman-written texts. Here: co-reference tool toautomatically generated summaries.Compared against LSA, not B&L 04 (domain-dependent) Did much better (p < .01)
    39. 39. Summary Coherence Rating
    40. 40. Summary Coherence Rating
    41. 41. Readability AssessmentCan entity grids be used for style classification?Judged against Schwarm and Ostendorf 2005’smethod for assessing readability (among others)
    42. 42. Readability AssessmentAs in S&O05, readability assessment is aclassification task.Training sample consisted of n documents suchthat (x⃗1,y1),...,(x⃗n,yn) x⃗i ∈RN,yi∈{−1,+1}where x⃗i is a feature vector for the ith documentin the training sample and yi its (positive ornegative) class label.
    43. 43. Readability Assessment6.2 Method Data: 107 artciles from Encyclopedia Britannica and Britannica Elemenary (from Barzilay & Elhadad 2003)
    44. 44. Readability Assessment
    45. 45. Readability AssessmentFeatures: Two versions, one with S&O features: Syntactic, semantic, combination (Flesch-Kincaid formula)One with more features: Coherence based, with entity transition notation (compared against LSA)
    46. 46. Readability Assessment
    47. 47. Readability Assessment
    48. 48. Discussion and ConclusionsPresented: novel framework for representing andmeasuring text coherence.Central to this framework is the entity-gridrepresentation of discourse, which capturesimportant patterns of sentence transitions.Coherence assessment rec-onceptualised as alearning task.Good performance on text ordering, summarycoherence evaluation, and readabilityassessment.
    49. 49. Discussion and ConclusionsThe entity grid is a flexible, yet computationallytractable, representation.Three important parameters for grid construction: the computation of coreferring entity classes, the inclusion of syntactic knowledge; the influence of salience.Empirically validated the importance of salienceand syntactic information for coherence-basedmodels.
    50. 50. Discussion and ConclusionsFull coreference resolution not perfect.(mismatches between training and testingconditions.)Instead of automatic coreference resolutionsystem, entity classes can be approximatedsimply by string matching.
    51. 51. Discussion and ConclusionsThis approach is not a direct implementation of anytheory in particular, in favor of automatic computationand breadth of coverage.Findings: pronominalization is a good indicator of document coherence. coherent texts are characterized by transitions with particular properties which do not hold for all discourses. These models are sensitive to the domain at hand and the type of texts under consideration (human-authored vs. machine generated texts).
    52. 52. Discussion and ConclusionsFuture work: Augmenting entity-based representation with fine- grained lexico-semantic knowledge. cluster entities based on their semantic relatedness, thereby creating a grid representation over lexical chains. develop fully lexicalized models, akin to traditional language models. Expanding grammatical categories to modifiers and adjuncts may provide additional information, in particular when considering machine generated texts. Investigating whether the proposed discourse representation and modeling approaches generalize across different languages Improving prediction on both local and global levels, with the ultimate goal of handling longer texts.

    ×