SlideShare a Scribd company logo
Comments-Oriented Blog
Summarization
Motivation
 Comments left by readers on Web documents contain valuable
information that can be utilized in different information retrieval tasks
including document search, visualization, and summarization.
 In this project we aim to summarize a Web document (e.g. a blog post) by
considering the comments left by its readers.
 Web documents are now presented with annotations given by their
readers in the form of tags, comments, ratings, and others.
 These Annotations along with comments are valuable input from users and
can be utilized in different IR tasks.
 By considering these comments, the generated summary can better
capture the input from the readers, as opposed to the author of the
document only.
 Comments-oriented summary provides balanced views from both author
and readers.
Introduction
Problem Statement
Given a blog post, consisting of a set of sentences P = {s1 , s2 , . . . , sn} and
the set of comments C = {c1 , c2 , . . . c} associated with blog post , the task of
comments- oriented blog summarization is to extract a subset of sentences P ,
denoted by Sr (Sr ⊂ P ), that best represents the discussion in C
Solution
 Score blog sentences based on their similarity with top scored relevant
comments.
 Comments are scored by using RQT graph/tensor based approach and
Named Entity Similarity score.
Abstract Solution
Comment Oriented Blog Summarization
Approach
 In summary generation it is important to retrieve relevant comments.
 A comment is relevant if it reflects the topic discussed in blog or has more
replies.
 A comment is scored using RQT model and Named Entity Similarity model
and top comments are selected as relevant comments.
 Similarity score for each sentence is calculated by summation of cosine
similarity between that sentence and other comments.
 Top scored sentences are the ones which are grasped by most
commentators and hence are relevant for summary.
RQT Model
 Three factors determine RQT score (Rc) of a comment
 Response Count (Cr ) : Number of replies to each comment.
 Topic Related Cluster Count (Ct): Cosine similarity used to cluster comments
 Quotation Count(Cq ): Number of times it is quoted in other comments.
Rc= Cr+Ct +Cq
Additional Factors
 Likes Count (Cl) :Our dataset is Techcrunch.com where people comment
using facebook. Likes on a particular comment also increase relevance of
a comment significantly. Number of likes (Cl) also affect weightage.
Rc= Cr+Ct +Cq+Cl
 Named Entity Similarity: Named entites in a comment are identified by
Stanford POS tagger and named entity score (Ec) is calculated by taking
number of named entities in a comment.
Final Score for each comment is calculated as
Score(C) = Rc + Ec
Sentence Scoring
 Comments whose weights are greater than threshold value are chosen as
top comments.
 Cosine similarity of each sentence is calculated with top comments .
 Sentences are assigned score based on their cosine similarity with top
comments.
Score(Si)=Summation(CS(Si,Comments))
CS : Cosine Similarity Si : Blog Sentence
 Only those top 5~7 sentences which has more than 6~8 words will be
selected as summary for the blog. More or less number of sentences can
be selected based on percentage of summary required.
Experiments And Results
 10 blogs were randomly chosen with large number of comments and
generated a summary with 30 % and 20 % of words.
 Generated summaries using online tools and compared System generated
summaries using the ROUGE Summary Evaluation Package by Chin-Yew
LIN.
Conclusions
 Our approach depends upon number of factors like the number of likes
each comment has got, the length of the blog content etc
 Generated summary was awarded with less ROGUE score if the number of
comments aren't enough.
 Generated summary was less accurate when none of comments were
accurate
 By scoring the comments based on named entities, accuracy of ranking of
comments increased significantly.
 This system needs more testing and larger dataset in order to get optimal
values of the constants.

More Related Content

Viewers also liked

Documento
DocumentoDocumento
Documento
ricy34
 
Samruddhi_Kohat_Resume
Samruddhi_Kohat_ResumeSamruddhi_Kohat_Resume
Samruddhi_Kohat_Resume
samruddhi142
 
9fdcdbd0d324838777002ff369fd410f
9fdcdbd0d324838777002ff369fd410f9fdcdbd0d324838777002ff369fd410f
9fdcdbd0d324838777002ff369fd410f
krkristal
 
Sensory healthcare 10:14
Sensory healthcare 10:14Sensory healthcare 10:14
Sensory healthcare 10:14
Vetyver
 
Enzimas, reacciones enzimáticas, clasificación, historia, uso.
Enzimas, reacciones enzimáticas, clasificación, historia, uso.Enzimas, reacciones enzimáticas, clasificación, historia, uso.
Enzimas, reacciones enzimáticas, clasificación, historia, uso.
Daniel Cruz
 
Build Profits and Value. Business Plans and Strategic Projects
Build Profits and Value. Business Plans and Strategic ProjectsBuild Profits and Value. Business Plans and Strategic Projects
Build Profits and Value. Business Plans and Strategic Projects
Eric Cole
 
Top 10 Mistakes in Sensory Branding
Top 10 Mistakes in Sensory BrandingTop 10 Mistakes in Sensory Branding
Top 10 Mistakes in Sensory Branding
Vetyver
 

Viewers also liked (7)

Documento
DocumentoDocumento
Documento
 
Samruddhi_Kohat_Resume
Samruddhi_Kohat_ResumeSamruddhi_Kohat_Resume
Samruddhi_Kohat_Resume
 
9fdcdbd0d324838777002ff369fd410f
9fdcdbd0d324838777002ff369fd410f9fdcdbd0d324838777002ff369fd410f
9fdcdbd0d324838777002ff369fd410f
 
Sensory healthcare 10:14
Sensory healthcare 10:14Sensory healthcare 10:14
Sensory healthcare 10:14
 
Enzimas, reacciones enzimáticas, clasificación, historia, uso.
Enzimas, reacciones enzimáticas, clasificación, historia, uso.Enzimas, reacciones enzimáticas, clasificación, historia, uso.
Enzimas, reacciones enzimáticas, clasificación, historia, uso.
 
Build Profits and Value. Business Plans and Strategic Projects
Build Profits and Value. Business Plans and Strategic ProjectsBuild Profits and Value. Business Plans and Strategic Projects
Build Profits and Value. Business Plans and Strategic Projects
 
Top 10 Mistakes in Sensory Branding
Top 10 Mistakes in Sensory BrandingTop 10 Mistakes in Sensory Branding
Top 10 Mistakes in Sensory Branding
 

Similar to Blog summarizer

Comments oriented blog summarization by sentence extraction
Comments oriented blog summarization by sentence extractionComments oriented blog summarization by sentence extraction
Comments oriented blog summarization by sentence extraction
Jhih-Ming Chen
 
An Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation NetworkAn Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation Network
Andrea Porter
 
Survey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesSurvey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniques
Anunaya
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Report
weichen
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
Shrinivasaragav Balasubramanian
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
Afzal Rais
 
Ists
IstsIsts
Vikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generatorVikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generator
IRJET Journal
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441
IJRAT
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
Editor IJCATR
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
maria.grineva
 
An E-commerce feedback review mining for a trusted seller’s profile and class...
An E-commerce feedback review mining for a trusted seller’s profile and class...An E-commerce feedback review mining for a trusted seller’s profile and class...
An E-commerce feedback review mining for a trusted seller’s profile and class...
IRJET Journal
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
elliando dias
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
IOSR Journals
 
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONSCOMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
ijnlc
 
Pagerank
PagerankPagerank
Pagerank
Sunil Rawal
 
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical AnnotationBuilding A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
CSCJournals
 
Object-Oriented Analysis & Design (OOAD) Domain Modeling Introduction
  Object-Oriented Analysis & Design (OOAD)  Domain Modeling Introduction  Object-Oriented Analysis & Design (OOAD)  Domain Modeling Introduction
Object-Oriented Analysis & Design (OOAD) Domain Modeling Introduction
Dang Tuan
 
Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1
Haitham Raik
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET Journal
 

Similar to Blog summarizer (20)

Comments oriented blog summarization by sentence extraction
Comments oriented blog summarization by sentence extractionComments oriented blog summarization by sentence extraction
Comments oriented blog summarization by sentence extraction
 
An Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation NetworkAn Efficient Algorithm For Ranking Research Papers Based On Citation Network
An Efficient Algorithm For Ranking Research Papers Based On Citation Network
 
Survey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniquesSurvey on article extraction and comment monitoring techniques
Survey on article extraction and comment monitoring techniques
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Report
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Ists
IstsIsts
Ists
 
Vikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generatorVikalp - Automatic multiple choice questions generator
Vikalp - Automatic multiple choice questions generator
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
An E-commerce feedback review mining for a trusted seller’s profile and class...
An E-commerce feedback review mining for a trusted seller’s profile and class...An E-commerce feedback review mining for a trusted seller’s profile and class...
An E-commerce feedback review mining for a trusted seller’s profile and class...
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability MethodTopic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
 
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONSCOMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
COMMTRUST: A MULTI-DIMENSIONAL TRUST MODEL FOR E-COMMERCE APPLICATIONS
 
Pagerank
PagerankPagerank
Pagerank
 
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical AnnotationBuilding A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
 
Object-Oriented Analysis & Design (OOAD) Domain Modeling Introduction
  Object-Oriented Analysis & Design (OOAD)  Domain Modeling Introduction  Object-Oriented Analysis & Design (OOAD)  Domain Modeling Introduction
Object-Oriented Analysis & Design (OOAD) Domain Modeling Introduction
 
Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
 

Recently uploaded

Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
wafawafa52
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
MuhammadJazib15
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
DharmaBanothu
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
shubhamsaraswat8740
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
mahaffeycheryld
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 

Recently uploaded (20)

Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
AI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdfAI in customer support Use cases solutions development and implementation.pdf
AI in customer support Use cases solutions development and implementation.pdf
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 

Blog summarizer

  • 2. Motivation  Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization.  In this project we aim to summarize a Web document (e.g. a blog post) by considering the comments left by its readers.  Web documents are now presented with annotations given by their readers in the form of tags, comments, ratings, and others.  These Annotations along with comments are valuable input from users and can be utilized in different IR tasks.  By considering these comments, the generated summary can better capture the input from the readers, as opposed to the author of the document only.  Comments-oriented summary provides balanced views from both author and readers.
  • 3. Introduction Problem Statement Given a blog post, consisting of a set of sentences P = {s1 , s2 , . . . , sn} and the set of comments C = {c1 , c2 , . . . c} associated with blog post , the task of comments- oriented blog summarization is to extract a subset of sentences P , denoted by Sr (Sr ⊂ P ), that best represents the discussion in C Solution  Score blog sentences based on their similarity with top scored relevant comments.  Comments are scored by using RQT graph/tensor based approach and Named Entity Similarity score.
  • 5. Approach  In summary generation it is important to retrieve relevant comments.  A comment is relevant if it reflects the topic discussed in blog or has more replies.  A comment is scored using RQT model and Named Entity Similarity model and top comments are selected as relevant comments.  Similarity score for each sentence is calculated by summation of cosine similarity between that sentence and other comments.  Top scored sentences are the ones which are grasped by most commentators and hence are relevant for summary.
  • 6. RQT Model  Three factors determine RQT score (Rc) of a comment  Response Count (Cr ) : Number of replies to each comment.  Topic Related Cluster Count (Ct): Cosine similarity used to cluster comments  Quotation Count(Cq ): Number of times it is quoted in other comments. Rc= Cr+Ct +Cq
  • 7. Additional Factors  Likes Count (Cl) :Our dataset is Techcrunch.com where people comment using facebook. Likes on a particular comment also increase relevance of a comment significantly. Number of likes (Cl) also affect weightage. Rc= Cr+Ct +Cq+Cl  Named Entity Similarity: Named entites in a comment are identified by Stanford POS tagger and named entity score (Ec) is calculated by taking number of named entities in a comment. Final Score for each comment is calculated as Score(C) = Rc + Ec
  • 8. Sentence Scoring  Comments whose weights are greater than threshold value are chosen as top comments.  Cosine similarity of each sentence is calculated with top comments .  Sentences are assigned score based on their cosine similarity with top comments. Score(Si)=Summation(CS(Si,Comments)) CS : Cosine Similarity Si : Blog Sentence  Only those top 5~7 sentences which has more than 6~8 words will be selected as summary for the blog. More or less number of sentences can be selected based on percentage of summary required.
  • 9. Experiments And Results  10 blogs were randomly chosen with large number of comments and generated a summary with 30 % and 20 % of words.  Generated summaries using online tools and compared System generated summaries using the ROUGE Summary Evaluation Package by Chin-Yew LIN.
  • 10. Conclusions  Our approach depends upon number of factors like the number of likes each comment has got, the length of the blog content etc  Generated summary was awarded with less ROGUE score if the number of comments aren't enough.  Generated summary was less accurate when none of comments were accurate  By scoring the comments based on named entities, accuracy of ranking of comments increased significantly.  This system needs more testing and larger dataset in order to get optimal values of the constants.