Your SlideShare is downloading. ×
Text summarization
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Text summarization


Published on

Published in: Education, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Text Summarization - Machine Learning TEXT SUMMARIZATION1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  • 2. TEXT SUMMARIZATION Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  • 3. WHAT TO SUMMARIZE?SINGLE VS. MULTIPLE DOCUMENTS Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  • 4. QUERY-FOCUSED SUMMARIZATION& GENERIC SUMMARIZATION Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  • 5. SUMMARIZATION FOR QUESTIONANSWERING: Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  • 6. EXTRACTIVE SUMMARIZATION& ABSTRACTIVE SUMMARIZATION Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning Abstractive Summarization  Express the ideas in the source document using different words 6
  • 7. SUMMARIZATION: THREE STAGES Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning Information Ordering: choose an order to place them in the summary Sentence Realization: clean up the sentence 7
  • 8. UNSUPERVISED CONTENT SELECTION Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  • 9. TOPIC SIGNATURE-BASED CONTENTSELECTION WITH QUERIES Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  • 10. SUPERVISED CONTENT SELECTION Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning Align  The sentences in the document with sentences in the summary Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  • 11. SUPERVISED CONTENT SELECTION Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  • 12. EVALUATING SUMMARIES: ROUGE ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  • 13. EVALUATING SUMMARIES: ROUGE Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  • 14. EXAMPLE Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable. Human 3: water spinach is a commonly eaten leaf vegetable of Asia. System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia. ROUGE -2= = 12/28 = 0.43 14
  • 15. ANSWERING HARDER QUESTION:QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  • 16. QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION Text Summarization - Machine Learning 16
  • 17. MAXIMAL MARGINAL RELEVANCE (MMR) An iterative method for content selection from multiple documents Text Summarization - Machine Learning Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17 Stop when desired length
  • 18. LLR + MMR CHOOSING INFORMATIVE YETNON-REDUNDANT SENTENCES One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  • 19. INFORMATION ORDERING Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity Topical ordering 19  Learn the ordering of topics in the source document
  • 20. DOMAIN-SPECIFIC ANSWERING:THE INFORMATION EXTRACTION METHOD A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning A good definition contains  Type or category “ The Hajj is a type of ritual ” A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  • 22. ARCHITECTURE FOR ANSWERING COMPLEXQUESTIONS Text Summarization - Machine Learning 22
  • 23. Text Summarization - Machine Learning 23 NLP Stanford course.REFERENCES: 
  • 24. Text Summarization - Machine Learning THANK YOU  24