Text summarization


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Text summarization

  1. 1. Text Summarization - Machine Learning TEXT SUMMARIZATION1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  2. 2. TEXT SUMMARIZATION Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  3. 3. WHAT TO SUMMARIZE?SINGLE VS. MULTIPLE DOCUMENTS Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  4. 4. QUERY-FOCUSED SUMMARIZATION& GENERIC SUMMARIZATION Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  5. 5. SUMMARIZATION FOR QUESTIONANSWERING: Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  6. 6. EXTRACTIVE SUMMARIZATION& ABSTRACTIVE SUMMARIZATION Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning Abstractive Summarization  Express the ideas in the source document using different words 6
  7. 7. SUMMARIZATION: THREE STAGES Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning Information Ordering: choose an order to place them in the summary Sentence Realization: clean up the sentence 7
  8. 8. UNSUPERVISED CONTENT SELECTION Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  9. 9. TOPIC SIGNATURE-BASED CONTENTSELECTION WITH QUERIES Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  10. 10. SUPERVISED CONTENT SELECTION Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning Align  The sentences in the document with sentences in the summary Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  11. 11. SUPERVISED CONTENT SELECTION Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  12. 12. EVALUATING SUMMARIES: ROUGE ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  13. 13. EVALUATING SUMMARIES: ROUGE Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  14. 14. EXAMPLE Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable. Human 3: water spinach is a commonly eaten leaf vegetable of Asia. System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia. ROUGE -2= = 12/28 = 0.43 14
  15. 15. ANSWERING HARDER QUESTION:QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  16. 16. QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION Text Summarization - Machine Learning 16
  17. 17. MAXIMAL MARGINAL RELEVANCE (MMR) An iterative method for content selection from multiple documents Text Summarization - Machine Learning Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17 Stop when desired length
  18. 18. LLR + MMR CHOOSING INFORMATIVE YETNON-REDUNDANT SENTENCES One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  19. 19. INFORMATION ORDERING Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity Topical ordering 19  Learn the ordering of topics in the source document
  20. 20. DOMAIN-SPECIFIC ANSWERING:THE INFORMATION EXTRACTION METHOD A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning A good definition contains  Type or category “ The Hajj is a type of ritual ” A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  21. 21. INFORMATION THAT SHOULD BE IN THEANSWER FOR 3 KINDS OF QUESTIONS Text Summarization - Machine Learning 21
  22. 22. ARCHITECTURE FOR ANSWERING COMPLEXQUESTIONS Text Summarization - Machine Learning 22
  23. 23. Text Summarization - Machine Learning 23 NLP Stanford course.REFERENCES: 
  24. 24. Text Summarization - Machine Learning THANK YOU  24