Text summerization

HINDI TEXT SUMMERIZATION
o Abhishek Kumar (CSE - 10114008
o Nishant Kumar (CSE – 10114017)
Presented by:Guided by:
• Dr. Manjira Sinha

 Introduction
Text Summarization is an algorithm that extracts sentences from
a text document, determines which are most important, and
returns with short (usually half the size), vital information in a
readable and structured way.
 It provides the reader with filtered description of source text and a
non redundant presentation of facts found in the text.

 Why we need Text-Summarization
 Business leaders, analysts, students and academic researchers
need to go through huge numbers of documents every day to
keep ahead, and a large portion of their time is spent just
figuring out what document is relevant and what isn’t.+
 By extracting important sentences and creating comprehensive
summaries, it’s possible to quickly assess whether or not a
document is worth reading.
• Headlines of news
• Abstract summary of technical paper
• Review of book or Preview of a Movie

 Why choose Hindi as language of study ?
 Native language of most of the people in Bihar,
Jharkhand, UP, MP Delhi, Chhattisgarh, Himachal Pradesh,
Haryana, Rajsthan etc.
 Lots of work done on English language relatively very few
have shown interest in Hindi language.
 Official Language of India.
It is written in the Devanagari script which has largest
alphabet set.

 Approaches Of Summarization
Extraction - Based
 Statistical
 Linguistic
 Hybrid
Abstraction - Based

 Proposed System
1. •Hindi Text Document
2. •Preprocessing
3. •Extracting sentence features
4. •Sentence Ranking
5. •Summary

 Preprocessing
 Sentence Segmentation
 पूर्ण विराम (|)
Tokenization
 पूर्ण विराम (|), उपविराम (:), अर्ण विराम (;)
Stopwords Removal
 कारक (ने, को, से, के लिए, में, पर)
 सिणनाम (आप, तू, यह, िह, कु छ )
 समुच्चयबोर्क अव्यय(और, िेककन, पर, एिं, इसलिए, मगर)
 समास (अनुसार, पयणन्त, िािा)
 Stemming
 Suffixes, plural are ignored (भारत,भारतीय)

Feature-Extraction
 Text Rank Feature
 Word frequency
(Sentence having most frequent word in paragraph will have high ranking.)
 Sentence Length
 Eliminate sentence too long or too short.
 Sentence Position
 Position of sentence in text, decides its importance.
 Beginning – Theme
 End – Conclude or summary
 Title Word Feature
 Sentences having words which matches with paragraph title words will be
included in summary.

Sentence Ranking & Summary
Calculation of Ranking value of each sentence based on our selected features.
Normalize the each feature ranking value in scale (0 to 1).
Add all feature ranking values and calculate final ranking for each sentence.
 Sort the final ranking of each sentence in descending order.
Based on percentage of summary requirement, select sentences in descending
order of ranking values.
Print the summary of paragraph in order of original paragraph.

Future Work
 We Will add more features like :
 Proper Noun, Numerical data, Sentence similarity.
 To optimize our algorithm
 Genetic Algorithm (GA)
 Artificial Neural Network (ANN)
 We will make a GUI based Software for better user
experience.

References
 Yihong Gong, Xin Liu : Generic Text Summarization Using Relevance Measure and
Latent Semantic Analysis
 Gunes Erkan, Dragomir R. Radev : LexRank - Graph-based Lexical Centrality as
Salience in Text Summarization
 David Kirk Evans, Judith L. Klavans, Kathleen R. McKeown : Columbia Newsblaster:
Multilingual News Summarization on the Web
 Algorithmia : Introduction to Automatic Text Summarization
 Wikipedia : Automatic_summarization

Thank You for your Attention
र्न्यिाद !!

Text summerization

More Related Content

What's hot

Similar to Text summerization

Recently uploaded

Text summerization