SlideShare a Scribd company logo
Summarizing discussion threads
• Suzan Verberne
• SAKE, 12-12-2016
About DISCOSUMO
• Automatic summarization of discussion forum threads
• Radboud University:
- Antal van den Bosch
- Suzan Verberne
• Tilburg University:
- Emiel Krahmer
- Sander Wubben
• Sanoma Media
Case: Viva forum
Problem
• Discussion forums on the web are an important source of information.
• But: forum threads can be extremely long
•  finding information in a forum thread can be a challenge, especially
when accessing the forum from a mobile device
Can we serve mobile forum users better by showing them summaries of
long threads?
Problem
How to summarize a forum thread?
• Question answering forums (e.g. StackOverflow):
- the opening post is a (technical) question and the responses are
answers to that question
- the best answer may be selected by the forum community through
voting
• Discussion forums (e.g. Viva, Autoweek, reddit):
- opinions and experiences are shared
- there is generally no such thing as the best answer
- threads can consists of dozens/hundreds posts
Case: Viva forum
Viva Forum (forum.viva.nl/)
• Dutch
• predominantly female user community
• 19 Million page views per month (1.5 Million unique visitors)
• readable for everyone; sample obtained from Sanoma
• most threads: experience and opinion sharing
• no hierarchy in the threads (‘flat structure’, but quotes possible)
• no liking/upvoting
• 21% of threads on Viva forum has >= 20 posts
Approach
Post/sentence selection:
• Show the user only the most important information
• Hide the less relevant information in between
Demo
How is it made?
1. Collect example data
2. Train classifiers to learn what are the most important posts and
sentences in a thread
3. Apply the classifier to unseen threads
4. Use a threshold on the classifier prediction to show more/fewer posts
and sentences
Collect example data
• If you ask five humans to create a summary of a discussion thread, they
create five different summaries
• But: a post selected by four of them it is more important than a post
selected by one of them
• We showed 106 long Viva threads to 10 different raters and asked them
to select the posts that they consider to be the most important for the
thread (number of selected posts decided by rater)
• 57 subjects participated in the study: all female, average age 27
…
Results: Usefulness of thread summarization
• Median usefulness score: 3 (on a 5-point scale)
• Standard deviation: 1.14 (averaged over threads)
• For 92% of the threads, at least one subject gave a usefulness score of 3
or higher
• For 62% of the threads, at least half of the subjects gave a usefulness
score of 3 or higher
Results: Agreement between human raters
• Median number of posts selected per thread: 7, with a large standard
deviation over raters (6.4)
• The agreement between the human summarizers was low (as expected)
Mean Cohen’s Kappa: 0.117
What determines the importance of a post or sentence?
• Number of words (longer = more important)
• Position in the thread (early response = more important)
• Punctuation and emoticons (fewer = more important)
• Similarity to the complete thread (higher = more important)
Evaluation setup
• 5-fold cross validation of threads
• Evaluation measures:
- Cohen’s Kappa (agreement with humans)
- Precision/Recall/F1 (using the human summaries as reference)
• Baselines:
- Random: select 7 posts randomly
- Position-based: select the first 7 posts
- Length-based: select the 7 longest posts
Results of the automatic summarization
(human-human Kappa: 0.117)
Kappa F1
random baseline -0.085 22.8%
position baseline 0.060 35.9%
length baseline 0.092 38.2%
our model 0.138 45.2%
Results of the automatic summarization
• Two different summaries can still both be good summaries
• Is it possible that readers are satisfied by a summary, even though the
summary is different from the summary that they would create
themselves?
 Pairwise (side-by-side) blind comparison and judgment by human
subjects
Results of the automatic summarization
• Pairwise (side-by-side) blind comparison and judgment by human
subjects: a human summary vs. our model’s summary
- human summary wins 48.3% of the comparison
- model summary wins 35.7% of the comparisons
- tie: 16.1% of the comparisons
in 51.7% of the direct comparisons, the summary by our model is
considered equal to or better than the human-made summary
Conclusions
• Subjects value the idea of thread summarization through post selection
• But inter-rater agreement for this task is low
• Despite the low agreement,
• we can automatically generate summaries that will in half of the cases be
judged equal to or better than summaries created by another human
• Also, the agreement between the model and human subjects is not lower
than the agreement among human subjects
• Two different summaries can both be good summaries
Thank you! Questions?
• http://discosumo.ruhosting.nl/
• http://sverberne.ruhosting.nl/

More Related Content

Similar to Summarizing discussion threads

FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show Transcripts
Jinho Choi
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Kavita Ganesan
 
Lecture-5.ppt
Lecture-5.pptLecture-5.ppt
Lecture-5.ppt
McPoolMac
 
Lecture-5.ppt
Lecture-5.pptLecture-5.ppt
Lecture-5.ppt
NimaaNaami
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insightsLaboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
Carla Marini
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
vrgokila
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
Alexander Borzunov
 
Survey
SurveySurvey
Survey
feueacmrq
 
Presentatie nbic2011templates
Presentatie nbic2011templatesPresentatie nbic2011templates
Presentatie nbic2011templates
thehyve
 
Netizen style commenting on fashion photos
Netizen style commenting on fashion photosNetizen style commenting on fashion photos
Netizen style commenting on fashion photos
Jason Tang
 
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
Kadir Akdeniz
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Timo Wandhoefer
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 
E3 chap-09
E3 chap-09E3 chap-09
Problems on Algorithms
Problems on AlgorithmsProblems on Algorithms
Problems on Algorithms
Beat Winehouse
 
Probles on Algorithms
Probles on AlgorithmsProbles on Algorithms
Probles on Algorithms
Spacetoshare
 
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
민진 최
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
UmairMuzaffar9
 
Delphi Method by Amr Ali
Delphi Method  by Amr AliDelphi Method  by Amr Ali
Delphi Method by Amr Ali
Amr Ali
 

Similar to Summarizing discussion threads (20)

FriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show TranscriptsFriendsQA: Open-domain Question Answering on TV Show Transcripts
FriendsQA: Open-domain Question Answering on TV Show Transcripts
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Lecture-5.ppt
Lecture-5.pptLecture-5.ppt
Lecture-5.ppt
 
Lecture-5.ppt
Lecture-5.pptLecture-5.ppt
Lecture-5.ppt
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insightsLaboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
Laboratorio Master BI&BDA (Modulo Web Data Analytics) : Reddit fashion insights
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Survey
SurveySurvey
Survey
 
Presentatie nbic2011templates
Presentatie nbic2011templatesPresentatie nbic2011templates
Presentatie nbic2011templates
 
Netizen style commenting on fashion photos
Netizen style commenting on fashion photosNetizen style commenting on fashion photos
Netizen style commenting on fashion photos
 
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
Kadir Akdeniz - Dynamic, Adaptive and Personalized User Interface for Communi...
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
Problems on Algorithms
Problems on AlgorithmsProblems on Algorithms
Problems on Algorithms
 
Probles on Algorithms
Probles on AlgorithmsProbles on Algorithms
Probles on Algorithms
 
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
 
Delphi Method by Amr Ali
Delphi Method  by Amr AliDelphi Method  by Amr Ali
Delphi Method by Amr Ali
 

More from Leiden University

‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Leiden University
 
Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discovery
Leiden University
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
Leiden University
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
Leiden University
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
Leiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
Leiden University
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
Leiden University
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
Leiden University
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
Leiden University
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leiden University
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
Leiden University
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
Leiden University
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
Leiden University
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
Leiden University
 

More from Leiden University (14)

‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
 
Text mining for health knowledge discovery
Text mining for health knowledge discoveryText mining for health knowledge discovery
Text mining for health knowledge discovery
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals'Het nieuwe zoeken' voor informatieprofessionals
'Het nieuwe zoeken' voor informatieprofessionals
 
kanker.nl & Data Science
kanker.nl & Data Sciencekanker.nl & Data Science
kanker.nl & Data Science
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Computationeel denken
Computationeel denkenComputationeel denken
Computationeel denken
 
Automatische classificatie van teksten
Automatische classificatie van tekstenAutomatische classificatie van teksten
Automatische classificatie van teksten
 
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
Leer je digitale klanten kennen: hoe zoeken ze en wat vinden ze?
 
RemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt ResearchRemBench: A Digital Workbench for Rembrandt Research
RemBench: A Digital Workbench for Rembrandt Research
 
Collecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in contextCollecting a dataset of information behaviour in context
Collecting a dataset of information behaviour in context
 
Search engines for the humanities that go beyond Google
Search engines for the humanities that go beyond GoogleSearch engines for the humanities that go beyond Google
Search engines for the humanities that go beyond Google
 
Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?Krijgen we ooit de beschikking over slimme zoektechnologie?
Krijgen we ooit de beschikking over slimme zoektechnologie?
 

Recently uploaded

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 

Recently uploaded (20)

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 

Summarizing discussion threads

  • 1. Summarizing discussion threads • Suzan Verberne • SAKE, 12-12-2016
  • 2. About DISCOSUMO • Automatic summarization of discussion forum threads • Radboud University: - Antal van den Bosch - Suzan Verberne • Tilburg University: - Emiel Krahmer - Sander Wubben • Sanoma Media
  • 4. Problem • Discussion forums on the web are an important source of information. • But: forum threads can be extremely long •  finding information in a forum thread can be a challenge, especially when accessing the forum from a mobile device Can we serve mobile forum users better by showing them summaries of long threads?
  • 5. Problem How to summarize a forum thread? • Question answering forums (e.g. StackOverflow): - the opening post is a (technical) question and the responses are answers to that question - the best answer may be selected by the forum community through voting • Discussion forums (e.g. Viva, Autoweek, reddit): - opinions and experiences are shared - there is generally no such thing as the best answer - threads can consists of dozens/hundreds posts
  • 6. Case: Viva forum Viva Forum (forum.viva.nl/) • Dutch • predominantly female user community • 19 Million page views per month (1.5 Million unique visitors) • readable for everyone; sample obtained from Sanoma • most threads: experience and opinion sharing • no hierarchy in the threads (‘flat structure’, but quotes possible) • no liking/upvoting • 21% of threads on Viva forum has >= 20 posts
  • 7. Approach Post/sentence selection: • Show the user only the most important information • Hide the less relevant information in between
  • 9. How is it made? 1. Collect example data 2. Train classifiers to learn what are the most important posts and sentences in a thread 3. Apply the classifier to unseen threads 4. Use a threshold on the classifier prediction to show more/fewer posts and sentences
  • 10. Collect example data • If you ask five humans to create a summary of a discussion thread, they create five different summaries • But: a post selected by four of them it is more important than a post selected by one of them • We showed 106 long Viva threads to 10 different raters and asked them to select the posts that they consider to be the most important for the thread (number of selected posts decided by rater) • 57 subjects participated in the study: all female, average age 27
  • 11.
  • 12. Results: Usefulness of thread summarization • Median usefulness score: 3 (on a 5-point scale) • Standard deviation: 1.14 (averaged over threads) • For 92% of the threads, at least one subject gave a usefulness score of 3 or higher • For 62% of the threads, at least half of the subjects gave a usefulness score of 3 or higher
  • 13. Results: Agreement between human raters • Median number of posts selected per thread: 7, with a large standard deviation over raters (6.4) • The agreement between the human summarizers was low (as expected) Mean Cohen’s Kappa: 0.117
  • 14. What determines the importance of a post or sentence? • Number of words (longer = more important) • Position in the thread (early response = more important) • Punctuation and emoticons (fewer = more important) • Similarity to the complete thread (higher = more important)
  • 15. Evaluation setup • 5-fold cross validation of threads • Evaluation measures: - Cohen’s Kappa (agreement with humans) - Precision/Recall/F1 (using the human summaries as reference) • Baselines: - Random: select 7 posts randomly - Position-based: select the first 7 posts - Length-based: select the 7 longest posts
  • 16. Results of the automatic summarization (human-human Kappa: 0.117) Kappa F1 random baseline -0.085 22.8% position baseline 0.060 35.9% length baseline 0.092 38.2% our model 0.138 45.2%
  • 17. Results of the automatic summarization • Two different summaries can still both be good summaries • Is it possible that readers are satisfied by a summary, even though the summary is different from the summary that they would create themselves?  Pairwise (side-by-side) blind comparison and judgment by human subjects
  • 18.
  • 19. Results of the automatic summarization • Pairwise (side-by-side) blind comparison and judgment by human subjects: a human summary vs. our model’s summary - human summary wins 48.3% of the comparison - model summary wins 35.7% of the comparisons - tie: 16.1% of the comparisons in 51.7% of the direct comparisons, the summary by our model is considered equal to or better than the human-made summary
  • 20. Conclusions • Subjects value the idea of thread summarization through post selection • But inter-rater agreement for this task is low • Despite the low agreement, • we can automatically generate summaries that will in half of the cases be judged equal to or better than summaries created by another human • Also, the agreement between the model and human subjects is not lower than the agreement among human subjects • Two different summaries can both be good summaries
  • 21. Thank you! Questions? • http://discosumo.ruhosting.nl/ • http://sverberne.ruhosting.nl/

Editor's Notes

  1. thyrax migraine
  2. 4 raters who summarized the most threads in the first study (A, B, C, D) Data: all threads that were summarized by at least two of them, and for which all raters gave a usefulness score of 3 or higher  52 threads Comparison of 3 summaries of the threads they had summarized before: their own summary (‘own) the summary by one of the other subjects (‘other’) the summary generated by our generic model (‘model’) Pairwise comparison, randomized and blind (429 pairs)
  3. replace table. sum is 119%
  4. : the agreement between the model and human subjects is not lower than the agreement among human subjects. Moreover, in a side-by-side comparison between a summary created by our model and a summary created by a human subject, the model-generated summary was voted for 42.5\% of the times.