Your SlideShare is downloading. ×
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Marcu 2000 presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Marcu 2000 presentation

359

Published on

Presentation of the Marcu 2000 ACL paper "The rhetorical parsing of unrestricted texts- A surface-based approach" for Discourse Parsing and Language Technology seminar.

Presentation of the Marcu 2000 ACL paper "The rhetorical parsing of unrestricted texts- A surface-based approach" for Discourse Parsing and Language Technology seminar.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
359
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Here we see an example of rhetorical structures in a text. The internal nodes are labeled with the names of the rhetorical relations that hold between the textual spans that are subsumed by their child nodes. Each relation between two nodes is represented graphically by means of a combination of straight lines and arcs. The material subsumed by the text span that corresponds to the starting point of an arc is subsidiary to the material subsumed by the text span that corresponds to the end point of an arc. A relation represented only by straight lines corresponds to cases in which the subsumed text spans are equally important. Text spans that subsume subsidiary information, i.e., text spans that correspond to starting points of arcs, are called satellites. All other text spans are called nuclei. Text fragments surrounded by curly brackets denote parenthetical units: their deletion does not affect the understanding of the textual unit to which they belong.
  • 1 and 3 will be covered later.
  • Regarding correlation: think of the earth and the moon.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • Last point: : if a rhetorical relation holds between two textual spans of the tree structure of a text, either that relation is extended or it can be explained in terms of a simple relation that holds between the promotion units of the constituent subspans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • e results in Table 8 show, the rhetorical parser fails to identify a fair num- ber of elementary units (51.2% recall); but the units it identifies tend to be correct (95.9% precision). As a consequence, performance at all other levels is affected. With respect to identifying hierarchical spans, recall is about 25% lower than the average human performance; with respect to labeling the nuclear status of spans, recall is about 30% below human performance; and with respect to labeling the rhetorical relations that hold between spans, recall is about 40% below human performance. In general, the precision of the rhetorical parser comes close to the human performance level. However, since the level of granularity at which the rhetorical parser works is much coarser than that used by human judges, many sentences are assigned a much simpler structure than the structure built by humans. For example, whenever an analyst used a JOINTrelation to connect two clause-like units separated by an and,the rhetorical parser failed to identify the two units; it often treated them as a single elementary unit. As a consequence, the recall figures at all levels were significantly lower than those specific to the humans.
  • Transcript

    • 1. The Rhetorical Parsingof Unrestricted Texts:A Surface-basedApproachDaniel Marcu (USC)ACL 2000
    • 2. AbstractThis paper explores the extent to which well-formedrhetorical structures can be automatically derived bymeans of surface-form-based algorithms.These algorithms: identify discourse usages of cue phrases break sentences into clauses hypothesize rhetorical relations that hold among textual units produce valid rhetorical structure trees for unrestricted natural language texts.The algorithms are empirically grounded in a corpusanalysis of cue phrases and rely on a first-orderformalization of rhetorical structure trees.
    • 3. AbstractThe algorithms are evaluated both intrinsicallyand extrinsically.The intrinsic evaluation assesses theresemblance between automatically andmanually constructed rhetorical structure trees.The extrinsic evaluation shows that automaticallyderived rhetorical structures can be successfullyexploited in the context of text summarization.
    • 4. Motivation
    • 5. MotivationMarcu explores the ground found at theintersection of: theories from a traditional, truth-based semantic perspective (weak) theories aimed at characterizing the constraints that pertain to the structure of unrestricted texts and the computational mechanisms that would enable the derivation of these structures (which are weak).Goal: Automatically build rhetorical constructionsby relying only on cohesion and connectives (i.e.,phrases such as for example, and, although, andhowever.)
    • 6. Outline2. Foundation3. Corpus Analysis of Cue Phrases4. Rhetorical Parsing Algorithm5. Evaluation6. Related Work7. Conclusion (Recap)
    • 7. FoundationThe hypothesis that underlies this work is thatconnectives, cohesion, shallow processing, anda well-constrained mathematical model of validrhetorical structure trees (RS-trees) can be usedto implement algorithms that determine the elementary units of a text, i.e., the units that constitute the leaves of the RS-tree of that text; the rhetorical relations that hold between elementary units and between spans of text; the relative importance (nucleus or satellite) and the size of the spans subsumed by these rhetorical relations.
    • 8. Foundation2.1 Determining the Elementary Units UsingConnectives and Shallow Processing Pros: punctuation, connectives Cons: Difficult (consider and)Using these, elementary unit boundaries can bedetermined with approximately 80% accuracy.
    • 9. Foundation2.2 Determining Rhetorical Relations UsingConnectives Pros: psychologically useful, common/regular Cons: 1. Sentence vs. discourse function unclear; 2. They do not indicate scope of relation; 3. Can signal different rhetorical relations.
    • 10. Foundation2.2.2 Discourse markers ambiguous as toscope: Rhetorical relations that hold between large textual spans can be explained in terms of similar relations that hold between their most important elementary units. (Marcu, elsewhere) Compositionality criterion for valid rhetorical structures: posits that a rhetorical structure tree is valid only if each rhetorical relation that holds between two spans is either an extended rhetorical relation or can be explained in terms of a simple rhetorical relation.
    • 11. FoundationDiscussion: The more complex the text, the harder to automatically identify. “I have never come across a case in which a simple connective signaled more than one rhetorical relation.” “I have never come across an example that would require one to deal with exclusively disjunctive hypotheses [that do not overlap in their scope].”
    • 12. Foundation2.3 Determining Rhetorical Relations UsingCohesion Pros: co-occurrence can determine thematic continuity, correlation between cohesion-defined textual segments and hierarchical, intentionally defined segments, cohesion works for smaller relations too. Cons: Marcu uses a coarse model of the relation between cohesion and rhetorical relations (to make things simpler).
    • 13. Foundation2.4 Determining Rhetorical Structure Using aWell-Constrained Mathematical Model Uses first order logic, with these features and constraints: A valid rhetorical structure is a binary tree whose leaves denote elementary textual units. Rhetorical relations hold between textual units and spans of various sizes. These relations are paratactic or hypotactic. Each node of a rhetorical structure tree has associated a status (NUCLEUS or SATELLITE),a type (the rhetorical relation that holds between the text spans that the node spans over), and a set of promotion units.
    • 14. Foundation2.4 Determining Rhetorical Structure Using aWell-Constrained Mathematical Model The status and type associated with each node are unique. The rhetorical relations of a valid rhetorical structure hold only between adjacent spans. There exists a span, which corresponds to the root node of the structure, that spans over the entire text. The status, type, and promotion set associated with each node reflect the compositionality criterion.
    • 15. Foundation
    • 16. FoundationTwo problems to solve:1. The problem of rhetorical grounding: Needs to show how starting from free, unrestricted text, connectives and cohesion can be used to automatically determine the elementary units of text and hypothesize simple, extended, and exclusively disjunctive rhetorical relations that hold between these units and spans of units.2. The problem of rhetorical structure derivation: The rhetorical structures must be consistent with the constraints given. I refer to this as.”
    • 17. Foundation Not modeled as an incremental process in which elementary units are determined and attached to an increasingly complex RS-tree.Process: Determine elementary discourse units (edus) first That knowledge of connectives and cohesion is then used to (over-)hypothesize simple, extended, and exclusively disjunctive rhetorical relations; These hypotheses and the well-constrained model of valid RS- trees are used to determine the set of valid rhetorical interpretations that are consistent with both the mathematical model and the hypotheses.
    • 18. Corpus AnalysisPrior to Marcu, no empirical data existed which couldanswer the question of the extent to which connectivescould be used to identify elementary units and hypothesizerhetorical relations.So… corpus study.The corpus study was designed to: investigate how cue phrases can be used to identify the elementary units of texts; to determine what rhetorical relations hold between units and spans of text, the nuclearity of the units, and the sizes of the related spans.Developed his own annotation scheme.
    • 19. Corpus Analysis450 potential markers from previous lists ofpotential discourse markers, cue phrasesBrown corpus examples for each: 300 words orso per example.Average of 17 text fragments per cue phrase.Overall, more than 7,600 randomly selectedtexts.
    • 20. Corpus AnalysisHardcoding?“By encoding algorithmic specific information inthe corpus, I only bootstrap the step that cantake one from annotated data to algorithmicinformation.”Doesn’t preclude other methods.
    • 21. Corpus AnalysisManually analyzed 2,100 of the text fragments inthe corpus.Annotated only 2,100 fragments (time). Of these: 1,197 had a discourse function; 773 were sentential; 244 were pragmatic.
    • 22. Corpus Analysis“I did not use an objective definition ofelementary unit. Rather, I relied on a moreintuitive one.”This corpus only used for development; testingwas done on independent corpora with non-biased judges.
    • 23. Rhetorical Parsing Algorithm
    • 24. Rhetorical Parsing Algorithm
    • 25. Rhetorical Parsing AlgorithmOrthographic markers, such as commas, periods,dashes, paragraph breaks, etc., play animportant role in our surface-based approach todiscourse processing, so are included.“By considering only cue phrases having adiscourse function in most of the cases, Ideliberately chose to focus more on precisionthan on recall with respect to the task ofidentifying the elementary units of text.”
    • 26. Rhetorical Parsing AlgorithmUsed Lex (Unix) to identify markers in text.Shallow analyzer then used to perform 11different actions, associated with the discoursemarkers, as a foundation for identifying edus. NOTHING, NORMAL, COMMA, NORMAL_THEN_COMMA, END, MATCH_PARENS, COMMA_PAREN, MATCH_DASH, SET_AND/SET_OR, DUAL
    • 27. Rhetorical Parsing Algorithm
    • 28. Rhetorical Parsing AlgorithmClause-like Unit Identification On the basis of the information derived from the corpus, Marcu designed an algorithm that identifies elementary textual unit boundaries in sentences and cue phrases that have a discourse function.
    • 29. Rhetorical Parsing Algorithm
    • 30. Rhetorical Parsing Algorithm
    • 31. Rhetorical Parsing AlgorithmUsed: an expository text of 5,036 words fromScientific American; a magazine article of 1,588words from Time; a narration of 583 words fromthe Brown corpus (segment P25:1250-1710).Manually broken into elementary units by threeslaves (coli masters students)
    • 32. Rhetorical Parsing Algorithm
    • 33. Rhetorical Parsing Algorithm
    • 34. Rhetorical Parsing Algorithm4.4.1 From Discourse Markers to RhetoricalRelationsFor each regex for discourse markers, sixfeatures for each possible discourse role wereannotated: Status (SAT_NUCLEUS, SATELLITE_NUC, NULL) Where to link (BEFORE, AFTER) Types (CLAUSE, SENTENCE, PARAGRAPH) Rhetorical relation (CONCESSION, ELABORATION &c.) Clause distance / Sentence distance Distance to salient unit
    • 35. Rhetorical Parsing Algorithm4.4.2 A Discourse-marker-based Algorithm forHypothesizing Rhetorical Relations. Iterate over all textual units (sentence, clause, paragraph) For each discourse marker, the algorithm constructs an exclusively disjunctive hypothesis concerning the rhetorical relations that the marker under scrutiny may signal. The algorithm assumes that the rhetorical structure at each level can be derived by hypothesizing rhetorical relations that hold between the units at that level. It generates simple and extended relations.
    • 36. Rhetorical Parsing Algorithm
    • 37. Rhetorical Parsing Algorithm4.4.3 A Word-co-occurrence-based Algorithm forHypothesizing Rhetorical Relations For paragraph-level analysis, discourse markers may not be enough. Here, cohesion, using word collocations, can be used to identify ELABORATION, BACKGROUND or JOINT relations.
    • 38. Rhetorical Parsing Algorithm4.5 A Proof-Theoretic Account of the Problem ofRhetorical Structure Derivation “Once the elementary units of a text have been determined and the rhetorical relations between them have been hypothesized at sentence, paragraph, and section levels, we need to determine the rhetorical structures that are consistent with these hypotheses and with the constraints specific to valid RS-trees. That is, we need to solve the problem of rhetorical structure derivation.”
    • 39. Rhetorical Parsing Algorithm4.5 A Proof-Theoretic Account of the Problem ofRhetorical Structure Derivation “Once the elementary units of a text have been determined and the rhetorical relations between them have been hypothesized at sentence, paragraph, and section levels, we need to determine the rhetorical structures that are consistent with these hypotheses and with the constraints specific to valid RS-trees. That is, we need to solve the problem of rhetorical structure derivation.”
    • 40. Rhetorical Parsing Algorithm4.5 A Proof-Theoretic Account of the Problem ofRhetorical Structure Derivation Marcu devised a proof theory to determine all valid rhetorical structures of a text. The theory consists of a set of axioms and rewriting rules that encode all possible ways in which one can derive the valid RS- trees of a text. More in Marcu (2000). Lots of ways to implement.
    • 41. Rhetorical Parsing Algorithm
    • 42. Rhetorical Parsing AlgorithmAmbiguity in Discourse: Many different possibilities (as in parsing). Best are skewed to the right (style of writing, placement of important information) Weights are “computed recursively by summing up the weights of the left and right branches of a rhetorical structure and the difference between the depth of the right and left branches of the structure. Hence, the more skewed to the right a tree is, the greater its weight w is.” To disambiguate: keeps only partial structures that lead to maximal weighting
    • 43. EvaluationTwo ways to evaluate: Compare the automatically derived trees with trees that have been built manually. Evaluate the impact that they have on the accuracy of other natural language processing tasks, such as anaphora resolution, intention recognition, or text summarization.
    • 44. Evaluation
    • 45. Evaluation
    • 46. EvaluationQualitative: Good Discourse Structures at the Paragraph Level Good Discourse Structures at the Text Level, for Short Texts Good Discourse Structures for Sentences, Paragraphs, and Texts that Use Unambiguous Discourse Markers Good Discourse Structures for Sentences that Use Markers Other than And. Bad Discourse Structures for Sentences that Use the Discourse Marker Incorrectly Labeled Intentional Relations. Bad Discourse Structures for Very Large Texts
    • 47. EvaluationSummarization Algorithm: The algorithm uses the rhetorical parser described in this paper to determine the discourse structure of a text given as input, it uses the discourse structure to induce a partial ordering on the elementary units in the text, and then, depending on the desired compression rate, it selects the p most important units in the text. Corpora: 5 SA texts mentioned, and 40 articles from TREC collection.
    • 48. Evaluation
    • 49. Related WorkCorston-Oliver (1998): Syntactic information tohypothesize relationsSumita et al. (1992): Deep syntactic processing,constrained trees, no weighting or disambiguationKurohashi and Nagao (1994) : discourse structuregenerator that builds discourse trees in anincremental fashion.Strube and Hahn (1999): building hierarchies ofreferential discourse elements.None of these were compared due to inapplicability.
    • 50. Conclusion“Quantitative and qualitative analyses of theresults show that many relations can be identifiedcorrectly within this framework.”“The brightest side of the story is that the resultsin this paper show that the rhetorical structuresderived by my parser can be used successfully inthe context of text summarization.”

    ×