Text Summarization
For Review And Feedback
BY :Aman Sadhwani
1
Monday, May 18,
2015
What is Text Summarization?
And why we need it?
• We can define summary as a text which reflects the main and important sentences from
the original text. In Text summarization, Summary is generated by Computer.
• In Recent Years we are witnessing the amount of textual information is increasing day by
day .The Textual Information grows rapidly. It becomes more difficult for the user to read
the textual information and also it leads to loss of interest. That is the reason why Text
Summarization came into picture which will solve this problem.
2
Monday, May 18,
2015
Types of Text Summarization
 1) Extraction: - In Extractive text summarization , summary is generated by selecting a set
of words, phrases, paragraph or sentences from the original document.
 2) Abstraction: - Abstractive methods are based on semantic representation and then use
natural language processing techniques to generate a summary that is nearer to
summary generated manually. This kind of summary may contain words that are not found
in the original document. Currently research is going on this method and demand for this
method is more.
3
Monday, May 18,
2015
Proposed System
4
Monday, May 18,
2015
 We have developed and compared two text summarization techniques
1) Reduction based
2) Inter section based
How Reduction Algorithm Works
 Step 1 - It takes a text as input.
 Step 2 - Splits it into one or more paragraph(s).
 Step 3 - Splits each paragraph into one or more sentence(s).
 Step 4 - Splits each sentence into one or more words.
 Step 5 - Gives each sentence weight-age (a floating point value) by comparing Its words
to a pre-defined dictionary called "stopWords.txt“
 If some word of a sentence matches to any word with the pre-defined Dictionary, then
the word is considered as Low weighted.
5
Monday, May 18,
2015
Cont..
 Step 6 - An ordered list of weighted sentences is then prepared (Relatively High weighted
sentences comes first and low weighted sentences comes At last position).
 Step 7 - Now, we have the ordered list of weighted sentences, it continues to Store each
sentence (from ordered weighted sentences) in the output Variable (i.e. a list) until it
reaches the reduction ratio (It uses A formula to determine max number of sentences to
put in the output List)
 Step 8 - The output list is then returned.
6
Monday, May 18,
2015
How InterSection Algorithm Works?
1. Split input text into Paragraph.
2. Split paragraph into sentences.
3. Split sentences into words.
4. Calculate the intersection between 2 sentences.
5. Remove non-alphabetic characters from sentence.
6. Convert content into dictionary.
7. Build the sentence dictionary.
8. Return best sentences in a paragraph.
9. Get the best sentences according to dictionary.
Monday, May 18,
2015 7
Flow Chart
Monday, May 18,
2015 8
Screen shots
Monday, May 18,
2015 9
Monday, May 18,
2015 10
Monday, May 18,
2015 11
Monday, May 18,
2015 12
Monday, May 18,
2015 13
Monday, May 18,
2015 14
Conclusion
Monday, May 18,
2015 15
Cont…
 By looking at last table we can say that intersection is faster than reduction
 But reduction creates better summary than intersection.
 Intersection works fine on some documents but generates only 1 or 2 line of summary on
some documents.
 This is because intersection is the most basic algorithm for text summarization. It doesn’t
use any NLP libraries like reduction.
Monday, May 18,
2015 16
Hardware & Software requirement
17
Monday, May 18,
2015
 Minimum Hardware Requirements
 Processor : Intel Pentium II or Higher
 RAM : 128 Mb or Higher
 Monitor ,Keyboard, Mouse
 Printer (Optional)
 Hard disk : 20 GB Or Higher
 Software Requirements
 OS: Windows xp or higher
 Java Installed On Machine
 Python 2.7 installed on machine.
Tools used
 NetBeans
 Python 2.7 IDLE
Monday, May 18,
2015 18
References
 http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html
 http://www.iajet.org/iajet_files/vol.1/no.4/Text%20Summarization%20Extraction%20System%
20TSES%20Using%20Extracted%20Keywords_doc.pdf
 http://en.wikipedia.org/wiki/Sentiment_analysis
Monday, May 18,
2015 19
Future enhancement
 Will support summarization for multiple file types.
 User wise Document management.
 Multi document summarization.
 Improved summarization algorithms.
Monday, May 18,
2015 20
THANK YOU
21
Monday, May 18,
2015

TEXT SUMMARIZATION

  • 1.
    Text Summarization For ReviewAnd Feedback BY :Aman Sadhwani 1 Monday, May 18, 2015
  • 2.
    What is TextSummarization? And why we need it? • We can define summary as a text which reflects the main and important sentences from the original text. In Text summarization, Summary is generated by Computer. • In Recent Years we are witnessing the amount of textual information is increasing day by day .The Textual Information grows rapidly. It becomes more difficult for the user to read the textual information and also it leads to loss of interest. That is the reason why Text Summarization came into picture which will solve this problem. 2 Monday, May 18, 2015
  • 3.
    Types of TextSummarization  1) Extraction: - In Extractive text summarization , summary is generated by selecting a set of words, phrases, paragraph or sentences from the original document.  2) Abstraction: - Abstractive methods are based on semantic representation and then use natural language processing techniques to generate a summary that is nearer to summary generated manually. This kind of summary may contain words that are not found in the original document. Currently research is going on this method and demand for this method is more. 3 Monday, May 18, 2015
  • 4.
    Proposed System 4 Monday, May18, 2015  We have developed and compared two text summarization techniques 1) Reduction based 2) Inter section based
  • 5.
    How Reduction AlgorithmWorks  Step 1 - It takes a text as input.  Step 2 - Splits it into one or more paragraph(s).  Step 3 - Splits each paragraph into one or more sentence(s).  Step 4 - Splits each sentence into one or more words.  Step 5 - Gives each sentence weight-age (a floating point value) by comparing Its words to a pre-defined dictionary called "stopWords.txt“  If some word of a sentence matches to any word with the pre-defined Dictionary, then the word is considered as Low weighted. 5 Monday, May 18, 2015
  • 6.
    Cont..  Step 6- An ordered list of weighted sentences is then prepared (Relatively High weighted sentences comes first and low weighted sentences comes At last position).  Step 7 - Now, we have the ordered list of weighted sentences, it continues to Store each sentence (from ordered weighted sentences) in the output Variable (i.e. a list) until it reaches the reduction ratio (It uses A formula to determine max number of sentences to put in the output List)  Step 8 - The output list is then returned. 6 Monday, May 18, 2015
  • 7.
    How InterSection AlgorithmWorks? 1. Split input text into Paragraph. 2. Split paragraph into sentences. 3. Split sentences into words. 4. Calculate the intersection between 2 sentences. 5. Remove non-alphabetic characters from sentence. 6. Convert content into dictionary. 7. Build the sentence dictionary. 8. Return best sentences in a paragraph. 9. Get the best sentences according to dictionary. Monday, May 18, 2015 7
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    Cont…  By lookingat last table we can say that intersection is faster than reduction  But reduction creates better summary than intersection.  Intersection works fine on some documents but generates only 1 or 2 line of summary on some documents.  This is because intersection is the most basic algorithm for text summarization. It doesn’t use any NLP libraries like reduction. Monday, May 18, 2015 16
  • 17.
    Hardware & Softwarerequirement 17 Monday, May 18, 2015  Minimum Hardware Requirements  Processor : Intel Pentium II or Higher  RAM : 128 Mb or Higher  Monitor ,Keyboard, Mouse  Printer (Optional)  Hard disk : 20 GB Or Higher  Software Requirements  OS: Windows xp or higher  Java Installed On Machine  Python 2.7 installed on machine.
  • 18.
    Tools used  NetBeans Python 2.7 IDLE Monday, May 18, 2015 18
  • 19.
  • 20.
    Future enhancement  Willsupport summarization for multiple file types.  User wise Document management.  Multi document summarization.  Improved summarization algorithms. Monday, May 18, 2015 20
  • 21.