Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Email conversation summmarization


Published on

Summarization has emerged as an increasingly useful approach to tackle the problem of information overload. Extracting information from online conversations can be of very good commercial and educational value. But majority of this information is present as noisy unstructured text making traditional document summarization techniques difficult to apply. In this project, we propose a novel approach to address the problem of conversation summarization. We develop an automatic text summarizer which extracts sentences from the email conversations to form a summary. Our approach consists of three phases. In the first phase, we prepare the dataset for usage by correcting spellings and segmenting the text. In the second phase, we represent each sentence by a set of predefined features. Finally, in the third phase we use a machine learning algorithm to train the summarizer on the set of feature vectors. We also a developed an interface which takes as input the document to be saummarized and retuns an extractive summary.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Email conversation summmarization

  1. 1. Email Conversation Summarization IRE Major Project Team 56
  2. 2. Introduction • An automatic extractive text summarizer which extracts sentences from the email conversations to form a summary • Based on sentence features and machine learning algorithms
  3. 3. Approach • 3 phases 1. Preprocessing 2. Feature extraction 3. Machine learning model
  4. 4. Preprocessing • Clearing the document from errors like spelling mistakes to make it more robust for getting proper feature values • Procedure – 1. Remove stop-words from the text 2. Query the web for each word using trigram of words and obtain closest matching word
  5. 5. Feature extraction • Following set of features are extracted from the training document 1. Mean tf-idf 2. Mean tf-isf 3. Sentence length 4. Sentence position 5. Similarity to title 6. Centroid Coherence 7. Is Question
  6. 6. Summarizer training • Use Naïve Bayes classifier to train the model based on the set of features extracted for each sentence. • After training the training data sentences are classified as ‘y’ or ‘n’ where ‘y’ means the sentence is part of summary. • When new test data comes then for each sentence based on its extracted features it is classified as important or not and included in final summary if important.
  7. 7. Thank you • Abhishek Kumar • Ankur Kadam • Savitansh Srivastava • Sneha Nallani