Your SlideShare is downloading. ×
Text summarization
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Text summarization

2,686
views

Published on

Published in: Education, Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,686
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
137
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Text Summarization - Machine Learning TEXT SUMMARIZATION1 Kareem El-Sayed Hashem Mohamed Mohsen Brary
  • 2. TEXT SUMMARIZATION Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Text Summarization - Machine Learning Summarization Applications  summaries of email threads  action items from a meeting  simplifying text by compressing sentences 2
  • 3. WHAT TO SUMMARIZE?SINGLE VS. MULTIPLE DOCUMENTS Single Document Summarization  Given a single document produce Text Summarization - Machine Learning  Abstract  Outline  Headline Multiple Document Summarization  Given a group of document produce a gist of the document  A series of news stories of the same event  A set of webpages about some topic or question 3
  • 4. QUERY-FOCUSED SUMMARIZATION& GENERIC SUMMARIZATION Generic Summarization  Summarize the content of a document Text Summarization - Machine Learning Query-focused Summarization  Summarize a document with respect to an information need expressed in a user query  A kind of complex question answering  Answer a question by summarizing a document that has the information to construct the answer 4
  • 5. SUMMARIZATION FOR QUESTIONANSWERING: Snippets  Create snippets summarizing a web page for a query Text Summarization - Machine Learning Multiple Documents  Create answer to complex questions summarizing multiple documents.  Instead of giving a snippet for each document  Create a cohesive answer that combines information from each document 5
  • 6. EXTRACTIVE SUMMARIZATION& ABSTRACTIVE SUMMARIZATION Extractive Summarization:  Create the summary from phrases or sentences in the source document(s) Text Summarization - Machine Learning Abstractive Summarization  Express the ideas in the source document using different words 6
  • 7. SUMMARIZATION: THREE STAGES Content Selection: choose sentences to extract from the document Text Summarization - Machine Learning Information Ordering: choose an order to place them in the summary Sentence Realization: clean up the sentence 7
  • 8. UNSUPERVISED CONTENT SELECTION Intuition Dating Back to Luhn (1958):  Choose sentences that have distinguished or informative words Text Summarization - Machine Learning Two Approaches to Define distinguished words  tf-idf: weigh each word wi in document j by tf-idf  Topic signature: choose smaller set of distinguished words  Log-likelihood ratio (LLR) 8
  • 9. TOPIC SIGNATURE-BASED CONTENTSELECTION WITH QUERIES Choose words that are informative either  By log-likelihood ratio (LLR) Text Summarization - Machine Learning  Or by appearing in the query  Weigh a sentence by weight of its words: 9
  • 10. SUPERVISED CONTENT SELECTION Given  A labeled training set of good summaries for each document Text Summarization - Machine Learning Align  The sentences in the document with sentences in the summary Extract Features  Position  Length of sentence  Word informativeness  Cohesion 10
  • 11. SUPERVISED CONTENT SELECTION Train  A binary classifier (put sentence in summary? Yes or no) Text Summarization - Machine Learning Problems  Hard to get labeled training data  Alignment is difficult  Performance not better that unsupervised algorithm 11
  • 12. EVALUATING SUMMARIES: ROUGE ROUGE “ Recall Oriented Understudy for Gisting Evaluation ” Text Summarization - Machine Learning Internal metric for automatically evaluating summaries  Based on BLEU (a metric used for machine translation)  Not as good as human evaluation.  But much more convenient 12
  • 13. EVALUATING SUMMARIES: ROUGE Given a document D, and an automatic summary X: Text Summarization - Machine Learning  Have N humans produce a set of reference summaries of D  Run System, giving automatic summary X  What percentage of the bigrams from the reference summaries appear in X? 13
  • 14. EXAMPLE Human 1: water spinach is a green leafy vegetable grown in the tropics. Text Summarization - Machine Learning Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable. Human 3: water spinach is a commonly eaten leaf vegetable of Asia. System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia. ROUGE -2= = 12/28 = 0.43 14
  • 15. ANSWERING HARDER QUESTION:QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION The (bottom-up) snippet method  Find a set of relevant documents Text Summarization - Machine Learning  Extract informative sentences form the documents  Order and modify the sentences into an answer The(top-down) information extraction method  Build specific answers for different questions types:  Definition questions  Biography questions  Certain medical questions 15
  • 16. QUERY-FOCUSED MULTI-DOCUMENTSUMMARIZATION Text Summarization - Machine Learning 16
  • 17. MAXIMAL MARGINAL RELEVANCE (MMR) An iterative method for content selection from multiple documents Text Summarization - Machine Learning Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:  Relevant: maximally relevant to the user query  High cosine similarity to the query  Novel: minimally redundant with the summary so far:  Low cosine similarity to the summary 17 Stop when desired length
  • 18. LLR + MMR CHOOSING INFORMATIVE YETNON-REDUNDANT SENTENCES One of many ways to combine the intuitions of LLR and MMR: Text Summarization - Machine Learning  Score each sentence based on LLR (including query words)  Include the sentence with highest score in the summary  Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far. 18
  • 19. INFORMATION ORDERING Chronological ordering:  Order sentences by the date of the document “ for summarizing news” Text Summarization - Machine Learning Coherence:  Choose ordering that make neighboring sentences similar(by cosine similarity)  Choose ordering in which neighboring sentences discuss the same entity Topical ordering 19  Learn the ordering of topics in the source document
  • 20. DOMAIN-SPECIFIC ANSWERING:THE INFORMATION EXTRACTION METHOD A good biography of a person contains:  A person’s birth/death, fame factor, education …etc Text Summarization - Machine Learning A good definition contains  Type or category “ The Hajj is a type of ritual ” A medical answer about a drug’s use contains:  The problem : medical condition  The intervention : drug or procedure  The outcome : the result of the study 20
  • 21. INFORMATION THAT SHOULD BE IN THEANSWER FOR 3 KINDS OF QUESTIONS Text Summarization - Machine Learning 21
  • 22. ARCHITECTURE FOR ANSWERING COMPLEXQUESTIONS Text Summarization - Machine Learning 22
  • 23. Text Summarization - Machine Learning 23 NLP Stanford course.REFERENCES: 
  • 24. Text Summarization - Machine Learning THANK YOU  24

×