Katja Filippova
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,662
On Slideshare
1,359
From Embeds
303
Number of Embeds
4

Actions

Shares
Downloads
34
Comments
0
Likes
3

Embeds 303

http://nlpseminar.ru 166
http://mathlingvo.ru 129
http://www.mathlingvo.ru 7
http://static.slideshare.net 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Automatic Text Summarization Katja Filippova filippova@eml-research.de EML Research gGmbH TU Darmstadt Text Summarization – 25.02.2009 – p. 1
  • 2. Text summarization • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • information retrieval • stock market prediction • generation of abstracts • online news summarization • ... Text Summarization – 25.02.2009 – p. 2
  • 3. Overview • Introduction • classification of summarization systems • abstraction vs. extraction • Text cohesion and coherence for summarization • graph based methods • discourse structure based methods • Document Understanding Conference • tasks • an example • Research directions • sentence fusion and compression • integrating world knowledge Text Summarization – 25.02.2009 – p. 3
  • 4. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” Text Summarization – 25.02.2009 – p. 4
  • 5. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” • Informative « includes quantitative/qualitative information « “informs” Text Summarization – 25.02.2009 – p. 4
  • 6. Text summarization: types • A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s), and that is no longer than half of the original text(s) (Hovy, 2003) • Indicative « indicates types of information « “alerts” • Informative « includes quantitative/qualitative information « “informs” • Critic/evaluative « evaluates the content of the document Text Summarization – 25.02.2009 – p. 4
  • 7. Text summarization: types INDICATIVE • The work of Consumer Advice Centres is examined. The information sources used to support this work are reviewed. The recent closure of many CACs has seriously affected the availability of consumer information and advice. The contribution that public libraries can make in enhancing the availability of consumer information and advice both to the public and other agencies involved in consumer information and advice, is discussed. Text Summarization – 25.02.2009 – p. 5
  • 8. Text summarization: types INFORMATIVE • An examination of the work of Consumer Advice Centres and of the information sources and support activities that public libraries can offer. CACs have dealt with pre-shopping advice, education on consumers’ rights and complaints about goods and services, advising the client and often obtaining expert assessment. They have drawn on a wide range of information sources including case records, trade literature, contact files and external links. The recent closure of many CACs has seriously affected the availability of consumer information and advice. Libraries can cooperate closely with advice agencies through local coordinating committed, shared premises, join publicity referral and the sharing of professional expertise. Text Summarization – 25.02.2009 – p. 5
  • 9. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference Text Summarization – 25.02.2009 – p. 6
  • 10. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference • Content: generic vs. query-based vs. user-focused « equal coverage of all major topics « based on a question “what are the causes of the war?” « users interested in chemistry Text Summarization – 25.02.2009 – p. 6
  • 11. Text summarization: types • Source: single-document vs. multi-document « research paper « proceedings of a conference • Content: generic vs. query-based vs. user-focused « equal coverage of all major topics « based on a question “what are the causes of the war?” « users interested in chemistry • Form: extract vs. abstract « fragments from the document « newly re-written text Text Summarization – 25.02.2009 – p. 6
  • 12. Extraction vs. abstraction How should a text summarization system proceed? • read the documents • understand them – build a semantic representation • generate a summary from this representation Text Summarization – 25.02.2009 – p. 7
  • 13. Extraction vs. abstraction • unfortunately, a rich semantic representation is not possible yet • to date, most summarization systems are extractive • usually, extraction units are sentences • low cost solution: could work without ontologies, complex representations, etc. • extractive summaries are usually incoherent • trade-off between non-redundancy and completeness Text Summarization – 25.02.2009 – p. 8
  • 14. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 15. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 16. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 17. Extraction vs. abstraction Three sentences from related documents (Oct. 27 2009): • The Syrian foreign minister today condemned the killing of eight civilians in a US raid as an act of quot;criminal and terrorist aggressionquot;. (The Guardian) • Syria accused the United States on Monday of carrying out a quot;terrorist aggressionquot; after a deadly raid near its border with Iraq which it said killed eight civilians. (Reuters) • Lebanese President Michel Suleiman on Monday contacted his Syrian counterpart Bashar Assad to denounce quot;Sunday’s American aggressionquot; against the Syrian village of Abu Kamal near the border with Iraq, local Elnashra website reported. (Aljazeera) Text Summarization – 25.02.2009 – p. 9
  • 18. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together Text Summarization – 25.02.2009 – p. 10
  • 19. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning Text Summarization – 25.02.2009 – p. 10
  • 20. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea Text Summarization – 25.02.2009 – p. 10
  • 21. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea • there is a striking difference with human generated texts – pronouns and connectives are in the right place, the flow of discourse makes sense Text Summarization – 25.02.2009 – p. 10
  • 22. Extraction vs. abstraction • extractive summaries are not coherent – sentences pulled out from different documents make sense each but sound awkward when put together • unresolved pronouns may distort the meaning • beginning with a sentence which starts with However, ... is not a good idea • there is a striking difference with human generated texts – pronouns and connectives are in the right place, the flow of discourse makes sense • How could one use this property of natural discourse for summarization? Text Summarization – 25.02.2009 – p. 10
  • 23. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 11
  • 24. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 11
  • 25. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 11
  • 26. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 11
  • 27. Text coherence vs. text cohesion • John enjoys playing the piano. John wants to become a famous piano player. John works hard and works hard every day. Working hard is necessary to become a famous piano player. • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. • John enjoys playing the piano and wants to become famous. He works hard and does it every day because it is necessary for his goal. Text Summarization – 25.02.2009 – p. 11
  • 28. Text coherence vs. text cohesion • Text coherence represents the overall structure of a multi-sentence text in terms of macro-level relations between clauses or sentences (Halliday & Hasan, 1996). « Rhetorical Structure Theory (Mann & Thompson, 1988) « Discourse Representation Theory (Kamp, 1981) « Discourse Lexicalized Tree Adjoining Grammar (Forbes, 2001) • John enjoys playing the piano. [John wants to become a famous piano player.] (that’s why) [John works hard and works hard every day.] Working hard is necessary to become a famous piano player. Text Summarization – 25.02.2009 – p. 12
  • 29. Text coherence vs. text cohesion • Text cohesion involves relations between words, word senses, or referring expressions, which determine how tightly connected the text is (Halliday & Hasan, 1996). « anaphora, ellipsis, connectives « synonymy and other lexical relations • John enjoys playing the piano. However, he woke up early yesterday. But the day before yesterday the weather was wonderful, because rain and snow started immediately and continued the whole day through. By the way, his teacher did the same. Text Summarization – 25.02.2009 – p. 12
  • 30. Coherence based summarization • earlier systems considered technical documents and aimed at identifying important information by assigning weights to sentences (Luhn, 1958; Edmundson, 1969) • several weighted features were used: « word (stem) frequency « presence of cue words (e.g., as a result, significant) which signalize important content « sentence position « document structure • feature weights were tuned manually Text Summarization – 25.02.2009 – p. 13
  • 31. Coherence based summarization • Rhetorical Structure Theory (Mann & Thompson, 1987) • elaboration • example • contrast • background • motivation • etc. Circumstance Attribution quot;I am optimisticquot; said Mr. Smith as the market plunged. (from Sporleder & Lapata, 2005) Text Summarization – 25.02.2009 – p. 14
  • 32. Coherence based summarization • one could use discourse structure for summarization (Marcu, 2000) • however, this is not done often: • there are few discourse parsers and they are not very precise • there are arguments whether tree representation is sufficient for discourse (Wolf & Gibson, 2005) • it is not obvious to classify rhetorical relations • some relations are argued to be anaphoric and not discourse (Webber et al., 2003) Text Summarization – 25.02.2009 – p. 15
  • 33. Cohesion based summarization • it is common to represent a text as a graph, where nodes are sentences and edges are some relations between them (e.g., discourse relations or just similarity) • a common graph connectivity assumption is that the nodes which are connected to many other nodes are likely to carry salient information • it is also assumed that nodes whose removal affects the structure of the document are important (Skorochodko, 1972 from Mani, 2001) Text Summarization – 25.02.2009 – p. 16
  • 34. Cohesion based summarization • it is common to represent a text as a graph, where nodes are sentences and edges are some relations between them (e.g., discourse relations or just similarity) • a common graph connectivity assumption is that the nodes which are connected to many other nodes are likely to carry salient information • it is also assumed that nodes whose removal affects the structure of the document are important (Skorochodko, 1972 from Mani, 2001) Text Summarization – 25.02.2009 – p. 16
  • 35. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) Text Summarization – 25.02.2009 – p. 17
  • 36. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) • a similarity threshold is used Text Summarization – 25.02.2009 – p. 17
  • 37. Cohesion based summarization • modern approaches extend this idea and use PageRank (Page & Brin, 1998) to find salient nodes (Erkan & Radev, 2004; Mihalcea & Tarau, 2004) in such a graph • similar sentences are connected (bag-of-words similarity) • a similarity threshold is used • the top N of page-ranked sentences are extracted Text Summarization – 25.02.2009 – p. 17
  • 38. Coherence vs. cohesion based TS • Coherence: + transparent; coherence of the output can be improved – annotation of relations is still a challenge; preprocessing difficulties • Cohesion: + intuitively appealing; low-cost; even unsupervized – requires WSD*, anaphora resolution; hard to pin down; tuned thresholds * word sense disambiguation Text Summarization – 25.02.2009 – p. 18
  • 39. DUC competitions • Document Understanding Conferences (2000-2007) • from 2008 Text Analysis Conference (TAC) • provide participants with - a task - data - manual and automatic evaluation • increasing challenge in tasks: from generic single-document summarization to multi-document update summary (2008) Text Summarization – 25.02.2009 – p. 19
  • 40. DUC competitions Sample topic: D0740I round-the-world balloon flight Report on the planning, attempts and first successful balloon circumnavigation of the earth by Bertrand Piccard and his crew. Text Summarization – 25.02.2009 – p. 20
  • 41. DUC competitions <DOC> <DOCNO> APW19981112.0453 </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> 11/12/1998 08:21:00 </DATE_TIME> <HEADER> w1942 &Cx1f; wstm- r i &Cx13; &Cx11; BC-Switzerland-BalloonQu 11-12 0355 </HEADER> <BODY> <SLUG> BC-Switzerland-Balloon Quest </SLUG> <HEADLINE> Swiss challenger prepares third attempt at global record </HEADLINE> &UR; AP Photos GEV 101-102 &QL; <TEXT> GENEVA (AP) _ Swiss balloon pilot Bertrand Piccard and his new teammate, British flight engineer Tony Brown, said Thursday they will be ready later this month for a new attempt to fly nonstop round the world. Their new Breitling Orbiter 3 balloon will take off from Chateau d’Oex, in the Swiss Alps, as soon after Nov. 25 as weather conditions are favorable, they said. It will be Piccard’s third attempt to become the first to pilot a balloon around the world. In February the Swiss pilot, along with British flight engineer AndyText Summarization – 25.02.2009 – p. 20 Elson and
  • 42. The EML NLP group at DUC 2007 Text Summarization – 25.02.2009 – p. 21
  • 43. Preprocessing: Annotation • Sentence splitting • Tokenization • PoS tagging • Chunking • Named Entities recognition Text Summarization – 25.02.2009 – p. 22
  • 44. Preprocessing: Problems • Sentence splitting <sentence>At Pine Ridge, a scrolling marquee at Big Bat’s Texaco expressed both joy over Clinton’s visit and wariness of all the official attention: “Welcome President Clinton.</sentence> <sentence>Remember our treaties,” the sign read. Text Summarization – 25.02.2009 – p. 23
  • 45. Preprocessing: Problems • Sentence splitting <sentence>At Pine Ridge, a scrolling marquee at Big Bat’s Texaco expressed both joy over Clinton’s visit and wariness of all the official attention: “Welcome President Clinton.</sentence> <sentence>Remember our treaties,” the sign read. • and cleaning <sentence>PINE RIDGE, S.D.</sentence> <sentence>(AP) - President Clinton turned the attention of his national poverty tour today to arguably the poorest, most forgotten U.S. citizens of them all: American Indians.</sentence> Text Summarization – 25.02.2009 – p. 23
  • 46. Preprocessing: Document filtering • Match topic with document extracts • Pick the top 5 matching documents Text Summarization – 25.02.2009 – p. 24
  • 47. Semantic analysis • Filter topic • Connect topic words with words in document sentences • Compute sentence scores matching words matching word sequences « ranked list of sentences Text Summarization – 25.02.2009 – p. 25
  • 48. Extractive summary generation • Rerank sentences • Select the top non-redundant sentences (250 word limit) • Re-arrange sentences Text Summarization – 25.02.2009 – p. 26
  • 49. A good summary Round-the-world balloon flight: Report on the planning, attempts and first successful balloon circumnavigation of the earth by Bertrand Piccard and his crew. Swiss balloon pilot Bertrand Piccard announced Wednesday that he has chosen Brian Jones as his teammate for his next attempt at circling the world in a balloon. Jones, 52, replaces fellow British flight engineer Tony Brown. Achieving what promoters called the last great milestone of aviation, Bertrand Piccard and Brian Jones joined legends like the Wright Brothers and Charles Lindbergh with Saturday’s completion of the first manned round-the-world balloon flight. At 4:54 a.m. EST Saturday, the two balloonists crossed the line of longitude from which they had departed on March 1 at Chateau D’Oex, Switzerland, ... Text Summarization – 25.02.2009 – p. 27
  • 50. A bad summary Angelina Jolie: What have been the most recent significant events in the life and career of actress Angelina Jolie? Angelina Jolie’s win for best supporting actress for her role in “Girl, Interrupted” came 21 years after father Jon Voight was awarded best actor for “Coming Home.“ ANGELINA JOLIE’S LIFE ON THE EDGE After all, her career is in overdrive. But Jolie cautions that she’s still a serious actress. It’s not like I’m suddenly a better actress because I have awards or this box office clout,” she says. “I am secure in the fact that I do have something to offer as an actress,”Jolie says. ‘... Text Summarization – 25.02.2009 – p. 28
  • 51. Evaluation • automatic evaluation with ROUGE (Lin, 2004) • manual evaluation with respect to « responsiveness « linguistic quality 1. grammaticality 2. non-redundancy 3. referential clarity 4. focus 5. structure and coherence • our system scored above the average, top 5 for non-redundancy and coherence (recall the document filtering stage) Text Summarization – 25.02.2009 – p. 29
  • 52. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases Text Summarization – 25.02.2009 – p. 30
  • 53. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords Text Summarization – 25.02.2009 – p. 30
  • 54. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords • relevance vs. redundancy issue: in MDS, how can we ensure non-redundancy of the summary? (Carbonell & Goldstein, 1998) Text Summarization – 25.02.2009 – p. 30
  • 55. Research directions • like in information retrieval, query expansion is expected to improve recall « WordNet (Fellbaum, 1998) for similarity « Wikipedia for relatedness (Strube & Ponzetto, 2006) « paraphrases • coreference resolution is needed for preprocessing, otherwise, e.g., pronouns are filtered as stopwords • relevance vs. redundancy issue: in MDS, how can we ensure non-redundancy of the summary? (Carbonell & Goldstein, 1998) • sentence ordering for extractive MDS (Barzilay & Lapata, 2005) Text Summarization – 25.02.2009 – p. 30
  • 56. Directions of research • abstractive summarization is a distant goal but there are ways to go beyond sentence extraction « sentence compression « sentence fusion Text Summarization – 25.02.2009 – p. 31
  • 57. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. Text Summarization – 25.02.2009 – p. 32
  • 58. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. Text Summarization – 25.02.2009 – p. 32
  • 59. Sentence compression This is true, regardless of the opinion that some people have of Syria, and of their unhappiness at Syria’s presence in Lebanon. • summarization on the sentence level • in principle, a compression can be different from the input (different wording and structure) • to date, most systems use word deletion only • meanwhile there is a compression corpus available online http://homepages.inf.ed.ac.uk/s0460084/data • the performance can be evaluated automatically Text Summarization – 25.02.2009 – p. 32
  • 60. Sentence fusion 1 John Smith, born November 15 1900, studied chemistry and physics at the University of London. 2 From 1917 Mr. Smith studied at the University of London and in 1921 he graduated with distinction. Text Summarization – 25.02.2009 – p. 33
  • 61. Sentence fusion 1 John Smith, born November 15 1900, studied chemistry and physics at the University of London. 2 From 1917 Mr. Smith studied at the University of London and in 1921 he graduated with distinction. « Mr. Smith studied chemistry and physics at the University of London from 1917. • pieces of related sentences are used to generate a novel sentence • can be seen as a middle ground between extractive and abstractive summarization • addresses the incompleteness-redundancy problem Text Summarization – 25.02.2009 – p. 33
  • 62. Thank you! (FOR YOUR ATTENTION) Text Summarization – 25.02.2009 – p. 34
  • 63. References • R. Barzilay & M. Lapata, 2005: Modeling local coherence: An entity-based approach • S. Brin & L. Page, 1998: The anatomy of a large-scale hypertextual web search engine • J. G. Carbonell & J. Goldstein, 1998: The use of MMR, diversity-based reranking for reordering documents and producing summaries • H. P. Edmundson, 1969: New methods in automatic extracting • G. Erkan & D. Radev, 2004: LexRank: Graph-based lexical centrality as salience in text summarization • C. Fellbaum, 1998: WordNet: An electronic lexical database Text Summarization – 25.02.2009 – p. 35
  • 64. References • K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi, B. L. Webber, 2001: DLTAG system – discourse parsing with a Lexicalized Tree Adjoining Grammar • M. Halliday & R. Hasan, 1996: Cohesion in text • E. H. Hovy, 2003: Text summarization • H. Kamp, 1981: A theory of truth and semantic representation • C.-Y. Lin, 2004: Automatic evaluation of summaries using N-gram co-occurrence statistics • H. P. Luhn, 1958: The automatic creation of literature abstracts • I. Mani, 2001: Automatic summarization Text Summarization – 25.02.2009 – p. 36
  • 65. References • W. C. Mann & S. A. Thompson, 1988: Rhetorical structure theory. Towards a functional theory of text organization • D. Marcu, 2000: The theory and practice of discourse parsing and summarization • R. Mihalcea & P. Tarau, 2004: TextRank: Bringing order into text • E. Skorochodko, 1972: Adaptive method of automatic abstracting and indexing • C. Sporleder & M. Lapata, 2005: Discourse chunking and its application to sentence compression • M. Strube & S. P. Ponzetto, 2006: WikiRelate! Computing semantic relatedness using Wikipedia Text Summarization – 25.02.2009 – p. 37
  • 66. References • B. L. Webber, M. Stone, A. Joshi, A. Knott, 2003: Anaphora and discourse structure • F. Wolf & E. Gibson, 2005: Representing discourse coherence: A corpus-based study Text Summarization – 25.02.2009 – p. 38