SlideShare a Scribd company logo
1 of 10
AN INSIGHT INTO THE UNRESOLVED
QUESTIONS AT STACK OVERFLOW
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan
Presented By: Ripon K. Saha
12th Working Conference on Mining Software
Repositories (MSR 2015) (Challenge Track)
Florence, Italy
RESEARCH PROBLEM: HIGHER RATE OF
UNRESOLVED QUESTIONS
 Unresolved question:
none of the answers was
accepted as a solution.
 Exponential increase over
the last 6 years.
 2.4m (27%) unresolved
out of 8.8m questions at SO
(Feb, 2015)
RQ1: Why do questions at Stack Overflow remain unresolved for
long time?
RQ2: Can we predict the questions for which none of the answers
might be accepted as solutions?
2
ASPECTS OF STUDY
 Comparative analysis (RQ1)
between questions using four
aspects:
 Lexical Analysis
 Code Readability (CR)
 Text Readability (TR)
 Semantic Analysis
 Topic Similarity (TS)
 Topic Entropy (TE)
 User Behaviour Analysis
 Answer Rejection Ratio (ARR)
 Last Access Delay (LAD)
 Popularity Analysis
 Votes for Questions (V)
 Reputation of Question Owners (R)
Dataset Used
 3,956 Unresolved
questions & 4,101
Resolved questions
 Each question has at
least 10 answers.
3
CODE & TEXT READABILITY
 Existing readability tools used– Buse and Weimer (TSE,
2010) and Readability Grade levels (Ponzanelli et al, ICSME,
2014)
 Distribution Fitting Curves of readability
 No significant difference in readability between two
types of questions. 4
TOPIC SIMILARITY & TOPIC ENTROPY
 Mallet (McCallum, 2002) for topic modeling
 Topic Similarity (Fig-a) between questions and
corresponding answers identical for both question types.
 Topic Entropy (i.e., topic uncertainty) (Fig-b) higher for
unresolved questions– unresolved questions are
less specific about topics of requirement.
5
USER BEHAVIOUR ANALYSIS
 Distribution Fitting Curves of rejection ratio.
 Owners of unresolved questions have greater
answer rejection ratio.
 Owners of unresolved questions are less frequent
at Stack Overflow. 6
POPULARITY ANALYSIS
 Used Question Votes and User Reputation
 Unresolved questions are less popular than resolved
questions.
 Owners of unresolved questions are less reputed.
7
PREDICTION MODELS (RQ2)
Algorithm Metrics Overall
Accuracy
Unresolved Questions
Precision Recall
J48
{ TE, ARR, LAD, V, R } 78.11% 78.70% 76.10%
{ARR, LAD, V} 77.90% 79.60% 73.90%
Logistic
Regression
{ TE, ARR, LAD, V, R } 73.58% 72.60% 74.20%
{ARR, LAD, V} 73.28% 71.70% 75.20%
Naïve
Bayes
{ TE, ARR, LAD, V, R } 71.69% 69.50% 75.50%
{ARR, LAD, V} 74.48% 80.00% 64.00%
 Three prediction models used from WEKA with 10-fold
cross-validation.
 78.11% prediction accuracy with 78.70% precision
and 76.10% recall.
 The identified features are satisfactorily predictive.
8
TAKE-HOME MESSAGE
 27% of SO questions are unresolved, and they are
increasing almost exponentially.
 Unresolved questions are ambiguous, less
focused and less popular.
 Owners of unresolved questions are less reputed
and less frequent at SO.
 Identified features can satisfactorily separate
unresolved from resolved questions.
 Findings can assist in question quality
management at SO.
9
THANK YOU!!
10

More Related Content

What's hot

Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
PARCC Grade 6 Math
PARCC Grade 6 MathPARCC Grade 6 Math
PARCC Grade 6 MathJon Lewis
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...Isabelle Augenstein
 
PARCC Grade 5 Math
PARCC Grade 5 Math PARCC Grade 5 Math
PARCC Grade 5 Math Jon Lewis
 
Helping Prospective Students Understand the Computing Disciplines
Helping Prospective Students Understand the Computing DisciplinesHelping Prospective Students Understand the Computing Disciplines
Helping Prospective Students Understand the Computing DisciplinesRandy Connolly
 
Attracting Women to Computing and Why it Matters
Attracting Women to Computing and Why it MattersAttracting Women to Computing and Why it Matters
Attracting Women to Computing and Why it MattersGail Carmichael
 
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsAsking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsMohammad Aliannejadi
 
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...Computational Exploration of the Linguistic Structures of Future-Oriented Exp...
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...Jinho Choi
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringJinho Choi
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Pieter Heyvaert
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishFaculty of Computer Science
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentationHan Xu, PhD
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 

What's hot (18)

Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
PARCC Grade 6 Math
PARCC Grade 6 MathPARCC Grade 6 Math
PARCC Grade 6 Math
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
Ela g3
Ela g3Ela g3
Ela g3
 
Ela g7
Ela g7Ela g7
Ela g7
 
Social networks
Social networksSocial networks
Social networks
 
PARCC Grade 5 Math
PARCC Grade 5 Math PARCC Grade 5 Math
PARCC Grade 5 Math
 
Helping Prospective Students Understand the Computing Disciplines
Helping Prospective Students Understand the Computing DisciplinesHelping Prospective Students Understand the Computing Disciplines
Helping Prospective Students Understand the Computing Disciplines
 
Attracting Women to Computing and Why it Matters
Attracting Women to Computing and Why it MattersAttracting Women to Computing and Why it Matters
Attracting Women to Computing and Why it Matters
 
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsAsking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
 
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...Computational Exploration of the Linguistic Structures of Future-Oriented Exp...
Computational Exploration of the Linguistic Structures of Future-Oriented Exp...
 
Semantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-AnsweringSemantics-based Graph Approach to Complex Question-Answering
Semantics-based Graph Approach to Complex Question-Answering
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
 
Resume
ResumeResume
Resume
 
Question Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and EnglishQuestion Answering for Machine Reading Evaluation on Romanian and English
Question Answering for Machine Reading Evaluation on Romanian and English
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentation
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 

Similar to MSR2015-Challenge

Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometricsDiane Talley
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTetsuya Sakai
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2Saman Sara
 
Rubric Detail A rubric lists grading criteria that instruct.docx
  Rubric Detail  A rubric lists grading criteria that instruct.docx  Rubric Detail  A rubric lists grading criteria that instruct.docx
Rubric Detail A rubric lists grading criteria that instruct.docxrobert345678
 
How to conduct systematic literature review
How to conduct systematic literature reviewHow to conduct systematic literature review
How to conduct systematic literature reviewKashif Hussain
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Preetha Chatterjee
 
Zouaq wole2013
Zouaq wole2013Zouaq wole2013
Zouaq wole2013Amal Zouaq
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresIJwest
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsAlejandro Salado
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...Kyoshiro Sugiyama
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Puya - Hossein Vahabi
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineMasud Rahman
 
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)Masud Rahman
 

Similar to MSR2015-Challenge (20)

MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Rubric Detail A rubric lists grading criteria that instruct.docx
  Rubric Detail  A rubric lists grading criteria that instruct.docx  Rubric Detail  A rubric lists grading criteria that instruct.docx
Rubric Detail A rubric lists grading criteria that instruct.docx
 
How to conduct systematic literature review
How to conduct systematic literature reviewHow to conduct systematic literature review
How to conduct systematic literature review
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 
Zouaq wole2013
Zouaq wole2013Zouaq wole2013
Zouaq wole2013
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
A Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting RequirementsA Set of Heuristics to Support Early Identification of Conflicting Requirements
A Set of Heuristics to Support Early Identification of Conflicting Requirements
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017Query Recommendation - Barcelona 2017
Query Recommendation - Barcelona 2017
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search Engine
 
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)
SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)
 

More from Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeMasud Rahman
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationMasud Rahman
 

More from Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

MSR2015-Challenge

  • 1. AN INSIGHT INTO THE UNRESOLVED QUESTIONS AT STACK OVERFLOW Mohammad Masudur Rahman, Chanchal K. Roy Department of Computer Science University of Saskatchewan Presented By: Ripon K. Saha 12th Working Conference on Mining Software Repositories (MSR 2015) (Challenge Track) Florence, Italy
  • 2. RESEARCH PROBLEM: HIGHER RATE OF UNRESOLVED QUESTIONS  Unresolved question: none of the answers was accepted as a solution.  Exponential increase over the last 6 years.  2.4m (27%) unresolved out of 8.8m questions at SO (Feb, 2015) RQ1: Why do questions at Stack Overflow remain unresolved for long time? RQ2: Can we predict the questions for which none of the answers might be accepted as solutions? 2
  • 3. ASPECTS OF STUDY  Comparative analysis (RQ1) between questions using four aspects:  Lexical Analysis  Code Readability (CR)  Text Readability (TR)  Semantic Analysis  Topic Similarity (TS)  Topic Entropy (TE)  User Behaviour Analysis  Answer Rejection Ratio (ARR)  Last Access Delay (LAD)  Popularity Analysis  Votes for Questions (V)  Reputation of Question Owners (R) Dataset Used  3,956 Unresolved questions & 4,101 Resolved questions  Each question has at least 10 answers. 3
  • 4. CODE & TEXT READABILITY  Existing readability tools used– Buse and Weimer (TSE, 2010) and Readability Grade levels (Ponzanelli et al, ICSME, 2014)  Distribution Fitting Curves of readability  No significant difference in readability between two types of questions. 4
  • 5. TOPIC SIMILARITY & TOPIC ENTROPY  Mallet (McCallum, 2002) for topic modeling  Topic Similarity (Fig-a) between questions and corresponding answers identical for both question types.  Topic Entropy (i.e., topic uncertainty) (Fig-b) higher for unresolved questions– unresolved questions are less specific about topics of requirement. 5
  • 6. USER BEHAVIOUR ANALYSIS  Distribution Fitting Curves of rejection ratio.  Owners of unresolved questions have greater answer rejection ratio.  Owners of unresolved questions are less frequent at Stack Overflow. 6
  • 7. POPULARITY ANALYSIS  Used Question Votes and User Reputation  Unresolved questions are less popular than resolved questions.  Owners of unresolved questions are less reputed. 7
  • 8. PREDICTION MODELS (RQ2) Algorithm Metrics Overall Accuracy Unresolved Questions Precision Recall J48 { TE, ARR, LAD, V, R } 78.11% 78.70% 76.10% {ARR, LAD, V} 77.90% 79.60% 73.90% Logistic Regression { TE, ARR, LAD, V, R } 73.58% 72.60% 74.20% {ARR, LAD, V} 73.28% 71.70% 75.20% Naïve Bayes { TE, ARR, LAD, V, R } 71.69% 69.50% 75.50% {ARR, LAD, V} 74.48% 80.00% 64.00%  Three prediction models used from WEKA with 10-fold cross-validation.  78.11% prediction accuracy with 78.70% precision and 76.10% recall.  The identified features are satisfactorily predictive. 8
  • 9. TAKE-HOME MESSAGE  27% of SO questions are unresolved, and they are increasing almost exponentially.  Unresolved questions are ambiguous, less focused and less popular.  Owners of unresolved questions are less reputed and less frequent at SO.  Identified features can satisfactorily separate unresolved from resolved questions.  Findings can assist in question quality management at SO. 9

Editor's Notes

  1. Introduce yourself +introductory statements. Today, I am going to talk about the findings on unresolved questions from Stack Overflow.
  2. First, lets clarify unresolved questions We refer to such questions as unresolved which are posted at least 6 months ago, but none of the posted answers are accepted as solutions. Right now, SO has 27% of such questions and they increased almost exponentially over the last 6 years. So, in this paper we answer two research questions: Why do questions at Stack Overflow remain unresolved for long time? Can we develop a model that would predict unresolved questions?
  3. For answering RQ1, we conduct a comparative study between unresolved and resolved questions (answer accepted as solution) from SO. We collect about 4K questions of each type, and compare them using four different analysis: Lexical analysis which includes checking for readability of code and text in the questions. Semantic analysis which focuses on question-answer topic similarity and topic entropy. User behaviuor analysis focuses on certain activities of the question owners. Popularity analysis compares questions votes and user reputation for both types of questions.
  4. This slide shows the readability comparison between unresolved and resolved questions. Green refers to readability distribution fit for resolved questions, and red means the same for unresolved questions. We find no significant difference in the readability of both questions.
  5. However, we got an interesting finding in case of question topics. Using topic modeling and information theory, we calculate topic entropy (analogous to Information entropy) for both resolved and unresolved questions. We found that topic entropy is higher for unresolved questions which suggests that Unresolved questions are less specific about requirements , that means less focused, which probably prevents them from satisfactory answers.
  6. In case of user behaviour analysis, we found that owners of unresolved questions are relatively reluctant in accepting answers as solution which suggest they are either careless or skeptical. Our analysis also shows that they are less frequent in SO.
  7. In case of popularity analysis, we found that unresolved questions are less popular than resolved questions, and owners of unresolved questions are generally less reputed than the owners of resolved questions.
  8. Now, in order to answer RQ2, we use the identified features in RQ1, and collect features for both question types (8K) We then develop 3 prediction models using J48, Logistic regression and Naïve Bayes from WEKA, and apply 10-fold cross-validation. We found a overall classification accuracy of 78.11% which is impressive. In case of unresolved questions, we found 80% precision and 76.10% recall which suggests that the identified features are quite predictive.
  9. So, here are the take-home messages: 27% of SO questions are unresolved and they are increasing almost exponentially. Unresolved questions are ambiguous, less focused and less popular Owners of unresolved questions are less reputed and less frequent at SO The identified features in this study are quite predictive for unresolved questions. So, they can be used for question quality management.
  10. Thanks for your time. Questions!!