SlideShare a Scribd company logo
Improving Low Quality 
StackOverflow Post Detection 
Luca Ponzanelli David Fullerton 
Andrea Mocci 
University Of Lugano 
Switzerland 
Alberto Bacchelli 
Delft University of Technology 
Netherlands 
StackExchange Inc. 
New York, USA 
Michele Lanza
Answer 
Answer Question 
StackOverflow
Answer 
Answer Question 
StackOverflow
Answer 
Answer 
Question 
6,000+ daily questions 
StackOverflow
Q 
Q 
Q 
Q 
StackOverflow 
Review Process 
Q 
Q 
Moderator 
System
Q 
Q 
Q 
Q 
StackOverflow 
Review Process 
Q 
Q 
Moderator 
System
Suggested Edits 
Late Answers and 
StackOverflow 
Review Process 
First Posts 
Low Quality Posts
Low Quality Posts 
Identified by the system 
StackOverflow 
Review Process
Low Quality Posts 
an inefficient approach 
increases the review 
StackOverflow 
Review Process 
queue size
Low Quality Posts 
an efficient approach 
saves time to reviewers 
StackOverflow 
Review Process
Low Quality Post 
Refine the review queue to 
remove misclassified posts 
StackOverflow 
Review Process
Body Length 
Capital Title 
Emails Count 
Lowercase Percentage 
Spaces Count 
StackOverflow 
Tags Count 
Text Speak Count 
Title Body Similarity 
Title Length 
Uppercase Percentage 
Quality Metrics
Body Length 
Capital Title 
Emails Count 
Lowercase Percentage 
Spaces Count 
Pure Textual Metrics 
StackOverflow 
Tags Count 
Text Speak Count 
Title Body Similarity 
Title Length 
Uppercase Percentage 
Quality Metrics
StackOverflow 
Quality Metrics 
Readability 
Metrics 
Popularity 
Metrics 
Textual 
Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
Readab 
ility 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Average Term Entropy 
Automated Reading Index 
Coleman Liau Index 
Flesch Kincaid Grade Level 
Flesch Reading Ease Score 
Gunning Fox Index 
Readab 
ility 
LOC Percentage 
Metric Entropy 
Sentences Count 
SMOG Grade 
Words Count 
Readability Metrics
Accepted by Originator Votes 
Approved Edit Suggestion 
Answer Badges Count 
Badges-Tags Coverage 
Bounty Start (End) Votes 
Close Votes 
Deletion Votes 
Down Votes 
Favorite Votes 
Moderator Review Votes 
Offensive Votes 
Reopen Votes 
Question Badges Count 
Spam Votes 
Total Badges 
Undeletion Votes 
Up Votes 
Popularity Metrics
StackOverflow 
Public Dump 
Classification 
Approach
StackOverflow 
Public Dump 
5,648,975 Questions 
(September 2013) 
Classification 
Approach
StackOverflow 
Public Dump 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
Score > 7 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
1 < Score < 6 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Neither Closed nor Deleted 
With an Accepted Answer 
Score < 0 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Closed or Deleted 
Very Good (A) 
Good (B) 
Bad (C) 
Very Bad (D) 
Classification 
Approach 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Genetic Algorithm 
Classification 
Function
Genetic Algorithm 
QF = 
Xn 
i=1 
wi · mi 
wi 2 [−1, 1] mi 2 [0, 1] 
Classification 
Function
Data Metrics 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
Metrics 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software) 
Data
A function assigns 
Positive Value if Good 
Negative Value if Bad 
L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza 
Classification 
Function 
Understanding and Classifying the Quality of Technical Forum Questions 
In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
quantiles 
q = 0.25 q = 0.25 
25% 
25% 
-1 0 1 
x = QF(post) 
y = freq(x) 
D C B A 
Classification 
Function
10% 10% 
q = 0.25 q = 0.25 
D C B A 
-1 0 1 
x = QF(post) 
y = freq(x) 
Classification 
Function
q = 0.25 q = 0.25 
D C B A 
-1 0 1 
x = QF(post) 
y = freq(x) 
40% 40% 
Classification 
Function
StackOverflow 
Public Dump 
Review Queue 
Refinement
StackOverflow 
Public Dump 
StackOverflow 
Private Dump 
Low Quality Post 
Review Queue 
Refinement
x x x 
Review Queue 
Refinement
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
A 
q=0.25 
D C C B A A A A A 
Review Queue 
Refinement
Review Queue (RQ) 
D D D C B B A 
A 
q=0.25 
D C C B A A A A A 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
∩ 
D D D C C C B A A A A A 
D 
q=0.1 
Review Queue 
Refinement
Review Queue (RQ) 
D D B 
∩ 
D D D C C C B A A A A A 
D 
q=0.1 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C C B B A A A A 
A A 
q=0.25 
D C C B A A A A A 
q=0.1 
U 
Review Queue 
Refinement
Review Queue (RQ) 
D D D D C B A 
D C C B A A A A A 
A A 
U 
q=0.25 q=0.1 
Review Queue 
Refinement
Hard Precision (HP) 
The percentage of posts in the review 
queue belonging to the class D 
Soft Precision (SP) 
The percentage of posts in the review 
queue belonging to the class D and C 
Review Queue 
Refinement
Hard Precision (HP) 
41.90% 
Soft Precision (SP) 
64.31% 
Review Queue (RQ) Size 
3,416 
Without 
Refinement 
Review Queue 
Refinement
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
RQ Size 3,416 HP 41.90% SP 64.31% 
Model RQ Size 
RQ 
Reduction 
Hard 
Precision 
Soft 
Precision A Red. B Red. C Red. D Red. 
RQ  A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% 
RQ  
(A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% 
RQ  A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% 
RQ  A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% 
RQ  A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% 
RQ  A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% 
RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% 
RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
Readability and Popularity Metrics 
are the most effective 
for queue refinement 
Lessons Learned
Readability and Popularity Metrics 
are the most effective 
for queue refinement 
Tradeoff between review queue 
reduction and bad post reduction 
Lessons Learned

More Related Content

Viewers also liked

Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
Filippo Lanubile
 
Big Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency Management
BYTE Project
 
Naïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments using
Nidhi Baranwal
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
Tao Xie
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
Tao Xie
 
Benevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolutionBenevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolution
Margaret-Anne Storey
 
The (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software EngineeringThe (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software Engineering
Margaret-Anne Storey
 
Research industry panel review
Research industry panel reviewResearch industry panel review
Research industry panel review
Margaret-Anne Storey
 
FSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering ResearchFSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering Research
Margaret-Anne Storey
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
Margaret-Anne Storey
 
How Developers Stay Current Using Twitter
How Developers Stay Current Using TwitterHow Developers Stay Current Using Twitter
How Developers Stay Current Using Twitter
Margaret-Anne Storey
 
Crowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software EngineeringCrowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software Engineering
Margaret-Anne Storey
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
Rahul Thankachan
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
Biplab Debnath
 
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
Margaret-Anne Storey
 
[Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger [Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger
Altimeter, a Prophet Company
 

Viewers also liked (17)

Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
 
Big Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency ManagementBig Data and Social Media Mining in Crisis and Emergency Management
Big Data and Social Media Mining in Crisis and Emergency Management
 
Naïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments usingNaïve multi label classification of you tube comments using
Naïve multi label classification of you tube comments using
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Benevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolutionBenevol 2012 Keynote: The Social Software (R)evolution
Benevol 2012 Keynote: The Social Software (R)evolution
 
The (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software EngineeringThe (R)evolution of Social Media in Software Engineering
The (R)evolution of Social Media in Software Engineering
 
Research industry panel review
Research industry panel reviewResearch industry panel review
Research industry panel review
 
FSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering ResearchFSE 2016 Panel: The State of Software Engineering Research
FSE 2016 Panel: The State of Software Engineering Research
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
 
How Developers Stay Current Using Twitter
How Developers Stay Current Using TwitterHow Developers Stay Current Using Twitter
How Developers Stay Current Using Twitter
 
Crowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software EngineeringCrowdsourcing Documentation in Software Engineering
Crowdsourcing Documentation in Software Engineering
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
 
Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...To Bot or Not:  How Bots can Support Collaboration in Software Engineering (I...
To Bot or Not: How Bots can Support Collaboration in Software Engineering (I...
 
[Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger [Report] Social Data Intelligence, by Susan Etlinger
[Report] Social Data Intelligence, by Susan Etlinger
 

Similar to Improving Low Quality Stack Overflow Post Detection

Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
dev2ops
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
NextMove Software
 
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
dev2ops
 
Webinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence AnalysisWebinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence Analysis
Displayr
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
Masud Rahman
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
緯鈞 沈
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
Avere Systems
 
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Ankita Kaul
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
Masud Rahman
 
Reasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using VampireReasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using Vampire
Jeff Chen
 
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems TogetherWhy Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
Kuldeep Singh
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
Masud Rahman
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
University of Saskatchewan
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting Requirements
Alejandro Salado
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.ppt
ddelucy
 

Similar to Improving Low Quality Stack Overflow Post Detection (15)

Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
Leveraging Your Company's DevOps Transformation (AppSec USA 2014)
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
Without Self-Service Operations, the Cloud is Just Expensive Hosting 2.0 - (a...
 
Webinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence AnalysisWebinar slides: DIY Market Mapping Using Correspondence Analysis
Webinar slides: DIY Market Mapping Using Correspondence Analysis
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
 
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
Predicting Helpfulness of User-Generated Product Reviews Through Analytical M...
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
Reasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using VampireReasoning Loops over Arrays using Vampire
Reasoning Loops over Arrays using Vampire
 
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems TogetherWhy Reinvent the Wheel: Let's Build Question Answering Systems Together
Why Reinvent the Wheel: Let's Build Question Answering Systems Together
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting Requirements
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.ppt
 

Recently uploaded

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 

Recently uploaded (20)

DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 

Improving Low Quality Stack Overflow Post Detection

  • 1. Improving Low Quality StackOverflow Post Detection Luca Ponzanelli David Fullerton Andrea Mocci University Of Lugano Switzerland Alberto Bacchelli Delft University of Technology Netherlands StackExchange Inc. New York, USA Michele Lanza
  • 2. Answer Answer Question StackOverflow
  • 3. Answer Answer Question StackOverflow
  • 4. Answer Answer Question 6,000+ daily questions StackOverflow
  • 5. Q Q Q Q StackOverflow Review Process Q Q Moderator System
  • 6. Q Q Q Q StackOverflow Review Process Q Q Moderator System
  • 7. Suggested Edits Late Answers and StackOverflow Review Process First Posts Low Quality Posts
  • 8. Low Quality Posts Identified by the system StackOverflow Review Process
  • 9. Low Quality Posts an inefficient approach increases the review StackOverflow Review Process queue size
  • 10. Low Quality Posts an efficient approach saves time to reviewers StackOverflow Review Process
  • 11. Low Quality Post Refine the review queue to remove misclassified posts StackOverflow Review Process
  • 12. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics
  • 13. Body Length Capital Title Emails Count Lowercase Percentage Spaces Count Pure Textual Metrics StackOverflow Tags Count Text Speak Count Title Body Similarity Title Length Uppercase Percentage Quality Metrics
  • 14. StackOverflow Quality Metrics Readability Metrics Popularity Metrics Textual Metrics
  • 15. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 16. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 17. Average Term Entropy Automated Reading Index Coleman Liau Index Flesch Kincaid Grade Level Flesch Reading Ease Score Gunning Fox Index Readab ility LOC Percentage Metric Entropy Sentences Count SMOG Grade Words Count Readability Metrics
  • 18. Accepted by Originator Votes Approved Edit Suggestion Answer Badges Count Badges-Tags Coverage Bounty Start (End) Votes Close Votes Deletion Votes Down Votes Favorite Votes Moderator Review Votes Offensive Votes Reopen Votes Question Badges Count Spam Votes Total Badges Undeletion Votes Up Votes Popularity Metrics
  • 19. StackOverflow Public Dump Classification Approach
  • 20. StackOverflow Public Dump 5,648,975 Questions (September 2013) Classification Approach
  • 21. StackOverflow Public Dump Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 22. Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 23. Neither Closed nor Deleted With an Accepted Answer Score > 7 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 24. Neither Closed nor Deleted With an Accepted Answer 1 < Score < 6 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 25. Neither Closed nor Deleted With an Accepted Answer Score < 0 Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 26. Closed or Deleted Very Good (A) Good (B) Bad (C) Very Bad (D) Classification Approach L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 28. Genetic Algorithm QF = Xn i=1 wi · mi wi 2 [−1, 1] mi 2 [0, 1] Classification Function
  • 29. Data Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 30. Metrics L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software) Data
  • 31. A function assigns Positive Value if Good Negative Value if Bad L. Ponzanelli, A. Mocci, A. Bacchelli, M. Lanza Classification Function Understanding and Classifying the Quality of Technical Forum Questions In Proceedings of QSIC 2014 (14th International Conference on Quality Software)
  • 32. quantiles q = 0.25 q = 0.25 25% 25% -1 0 1 x = QF(post) y = freq(x) D C B A Classification Function
  • 33. 10% 10% q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) Classification Function
  • 34. q = 0.25 q = 0.25 D C B A -1 0 1 x = QF(post) y = freq(x) 40% 40% Classification Function
  • 35. StackOverflow Public Dump Review Queue Refinement
  • 36. StackOverflow Public Dump StackOverflow Private Dump Low Quality Post Review Queue Refinement
  • 37. x x x Review Queue Refinement
  • 39. Review Queue (RQ) D D D D C C B B A A A A A q=0.25 D C C B A A A A A Review Queue Refinement
  • 40. Review Queue (RQ) D D D C B B A A q=0.25 D C C B A A A A A Review Queue Refinement
  • 41. Review Queue (RQ) D D D D C C B B A A A A ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement
  • 42. Review Queue (RQ) D D B ∩ D D D C C C B A A A A A D q=0.1 Review Queue Refinement
  • 43. Review Queue (RQ) D D D D C C B B A A A A A A q=0.25 D C C B A A A A A q=0.1 U Review Queue Refinement
  • 44. Review Queue (RQ) D D D D C B A D C C B A A A A A A A U q=0.25 q=0.1 Review Queue Refinement
  • 45. Hard Precision (HP) The percentage of posts in the review queue belonging to the class D Soft Precision (SP) The percentage of posts in the review queue belonging to the class D and C Review Queue Refinement
  • 46. Hard Precision (HP) 41.90% Soft Precision (SP) 64.31% Review Queue (RQ) Size 3,416 Without Refinement Review Queue Refinement
  • 47. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 48. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 49. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 50. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 51. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 52. RQ Size 3,416 HP 41.90% SP 64.31% Model RQ Size RQ Reduction Hard Precision Soft Precision A Red. B Red. C Red. D Red. RQ A(Mp, 0.25) 3,108 9.02% 45.30% 68.95% 27.19% 19.37% 3.80% 1.74% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2650 22.42% 49.81% 74.87% 53.07% 43.59% 13.09% 7.89% RQ (A(MP,0.25) ∪ A(MSO,0.05)) 2529 25.97% 51.60% 74.02% 53.51% 44.40% 25.79% 8.93% RQ A(MP, 0.33) 2552 25.29% 50.90% 76.65% 60.96% 48.84% 14.01% 9.35% RQ A(MR, 0.33) 2505 26.67% 51.62% 73.81% 50.88% 45.11% 27.23% 9.77% RQ A(MR, 0.40) 2300 32.67% 54.91% 77.91% 65.79% 56.61% 30.76% 11.86% RQ A(MP, 0.40) 2421 29.13% 51.92% 78.40% 67.54% 54.69% 16.10% 12.28% RQ ∩ D(MP, 0.40) 2244 34.31% 52.90% 79.63% 71.93% 60.34% 21.47% 17.17% RQ ∩ D(MR, 0.40) 1912 44.03% 60.67% 85.15% 85.53% 74.67% 38.74% 19.05%
  • 53. Readability and Popularity Metrics are the most effective for queue refinement Lessons Learned
  • 54. Readability and Popularity Metrics are the most effective for queue refinement Tradeoff between review queue reduction and bad post reduction Lessons Learned