Understanding the Factors for Fast Answers in
Technical Q&A Websites: An Empirical Study of Four
Stack Exchange Websites
Journal First Presentation - Empirical Software Engineering
Shaowei
Wang
Tse-Hsun
(Peter) Chen
Ahmed E.
Hassan
1
Developers are always facing problems
2
Technical Q&A websites provide platforms for developers to
seek help from others
3
4
~10,000 new
questions per
day
5
~10,000 new
questions per
day
~50 million
monthly visitors
6
~10,000 new
questions per
day
~50 million
monthly visitors
~13 million
questions and
~24 million
answers
7
Almost one million questions get their accepted
answers after more than one week on Stack Overflow
8
What factors impact the speed of questions getting
accepted answers?
9
We study the top four most popular Q&A websites in
Stack Exchange network
10
• Selection criteria for studied questions:
• Questions that have an accepted answer
• Questions that have at least a score of 1
• Questions that are not self-answered
We study the top four most popular Q&A websites in
Stack Exchange network
11
We study the top four most popular Q&A websites in
Stack Exchange network
55,853 questions
70,336 questions
7,134 questions
10,776 questions
12
We study the relationship between the studied factors and
the speed of getting an accepted answer
Metrics
calculation
Model
construction
Model
interpretation
Model
assessment
13
We study the relationship between the studied factors and
the speed of getting an accepted answer
Metrics
calculation
Model
construction
Model
interpretation
Model
assessment
Question
(16 factors)
Answer
(4 factors)
Asker
(20 factors)
Answerer
(6 factors)
14
Fast-answered
questions (top
20%)
Slow-answered
questions
(bottom 20%)
Correlation&
redundancy
analysis
Non-linear
logistic regression
model building
We study the relationship between the studied factors and
the speed of getting an accepted answer
Metrics
calculation
Model
construction
Model
interpretation
Model
assessment
15
We study the relationship between the studied factors and
the speed of getting an accepted answer
AUC
Metrics
calculation
Model
construction
Model
interpretation
Model
assessment
16
Explanatory power
(Wald χ2 test)
Relationship visualization
We study the relationship between the studied factors and
the speed of getting an accepted answer
Metrics
calculation
Model
construction
Model
interpretation
Model
assessment
17
Our models achieve an AUC of 0.85-0.95
AUC=0.95
AUC=0.94
AUC=0.85
AUC=0.86
18
Our models achieve an AUC of 0.85-0.95
AUC=0.95
AUC=0.94
AUC=0.85
AUC=0.86
Our models have a good enough
fit for interpretation.
19
Top 1 factor: past speed of answering questions of an
answerer
Past speed of
answering questions
of an answerer
20
A question tends to receive a fast accepted answer from
answerers who previously answered questions fast
Probabilityofgetting
aslowacceptedanswer
Past speed of answering questions
of an answerer before (hours in
logarithm scale)
21
A question tends to receive a fast accepted answer from
answerers who previously answered questions fast
A wide confidence interval indicates
that the relationship is less clear due
to the lack of data points in
that data range.
Probabilityofgetting
aslowacceptedanswer
Probabilityofgetting
aslowacceptedanswer
Past speed of answering questions
of an answerer before (hours in
logarithm scale)
22
Past speed of answering questions of an answerer (hours in logarithm scale)
Probabilityofgetting
aslowacceptedanswer
A question tends to receive a fast accepted answer from
answerers who previously answered questions fast
23
Top 2 factor: length of body of a question
Past speed of
answering questions
of an answerer
Length of body of a
question
24
Probabilityofgetting
aslowacceptedanswer
A long question tends to receive a slow accepted answer
Length of body of a question (characters in logarithmic scale)
25
Top 3 factor: past speed of getting accepted answers
of tags of a question
Past speed of
answering questions
of an answerer
Past speed of getting
accepted answers of
tags of a question
Length of body of a
question
26
Probabilityofgetting
aslowacceptedanswer
A question with tags that received accepted answers fast
tends to receive a fast accepted answer
Time of getting accepted answers of tags of a question in the past (hours in logarithm scale)
27
Fast accepted answers rely heavily on the answerer
0
10
20
30
40
50
60
70
Stack Overflow Mathematics Ask Ubuntu Super User
%ofexplanatorypower
Question Asker Answer Answerer
28
0
10
20
30
40
50
60
70
Stack Overflow Mathematics Ask Ubuntu Super User
%ofexplanatorypower
Question Asker Answer Answerer
Fast accepted answers rely heavily on the answerer
29
Suggestions for Technical Q&A website designers
Deliver questions to the right answerers and
motivate them to answer questions faster.
30
86% - 96% of the accepted answers are posted by
answerers that answered more than 5 questions before
31
• Non-frequent answerers (<= 5 answers)
• People that answered no more than 5 answers in the past
• Frequent answerers (> 5 answers)
• People that answered more than 5 answers in the past
Non-frequent answerers vs. Frequent answerers
32
Non-frequent answerers are the bottleneck for fast answersMeantimeofposting
anacceptedanswer(hours)
33
34
The current incentive system only motivates frequent answerers
well, but not non-frequent answerers
Non-frequent answerers are answering questions that are
as important as ones answered by non-frequent answerers
Meanscoreofquestions
35
Suggestions for Technical Q&A website designers
Deliver questions to the right answerers and
motivate them to answer questions faster.
Improve the incentive system to attract the non-
frequent answerers to become more active.
36
Frequent answerers tend to answer shorter questions
37
Frequent answerers probably game the incentive system
Yeah, some folks are going to specialize in super-fast answers
to easy questions and get more rep points than deserved,…
The bigger problem is that this has the side effect of causing
interesting but more difficult questions to get ignored. Typical
example: someone asks a question that gets a lot of views and two or more upvotes,
but it's hard enough that no one can answer within an hour or so.
38
Suggestions for Technical Q&A website designers
Deliver questions to the right answerers and
motivate them to answer questions faster.
Improve the incentive system to attract the non-
frequent answerers to become more active.
Improve the incentive system to factor in the value
and difficulty of questions.
39
41
42
43
44
Shaowei Wang
shaowei@cs.queensu.ca
51

Understand the Factors for Fast Answers in Technical Q&A Websites: An Empirical Study of Four Stack Exchange Websites

  • 1.
    Understanding the Factorsfor Fast Answers in Technical Q&A Websites: An Empirical Study of Four Stack Exchange Websites Journal First Presentation - Empirical Software Engineering Shaowei Wang Tse-Hsun (Peter) Chen Ahmed E. Hassan 1
  • 2.
    Developers are alwaysfacing problems 2
  • 3.
    Technical Q&A websitesprovide platforms for developers to seek help from others 3
  • 4.
  • 5.
  • 6.
    ~10,000 new questions per day ~50million monthly visitors 6
  • 7.
    ~10,000 new questions per day ~50million monthly visitors ~13 million questions and ~24 million answers 7
  • 8.
    Almost one millionquestions get their accepted answers after more than one week on Stack Overflow 8
  • 9.
    What factors impactthe speed of questions getting accepted answers? 9
  • 10.
    We study thetop four most popular Q&A websites in Stack Exchange network 10
  • 11.
    • Selection criteriafor studied questions: • Questions that have an accepted answer • Questions that have at least a score of 1 • Questions that are not self-answered We study the top four most popular Q&A websites in Stack Exchange network 11
  • 12.
    We study thetop four most popular Q&A websites in Stack Exchange network 55,853 questions 70,336 questions 7,134 questions 10,776 questions 12
  • 13.
    We study therelationship between the studied factors and the speed of getting an accepted answer Metrics calculation Model construction Model interpretation Model assessment 13
  • 14.
    We study therelationship between the studied factors and the speed of getting an accepted answer Metrics calculation Model construction Model interpretation Model assessment Question (16 factors) Answer (4 factors) Asker (20 factors) Answerer (6 factors) 14
  • 15.
    Fast-answered questions (top 20%) Slow-answered questions (bottom 20%) Correlation& redundancy analysis Non-linear logisticregression model building We study the relationship between the studied factors and the speed of getting an accepted answer Metrics calculation Model construction Model interpretation Model assessment 15
  • 16.
    We study therelationship between the studied factors and the speed of getting an accepted answer AUC Metrics calculation Model construction Model interpretation Model assessment 16
  • 17.
    Explanatory power (Wald χ2test) Relationship visualization We study the relationship between the studied factors and the speed of getting an accepted answer Metrics calculation Model construction Model interpretation Model assessment 17
  • 18.
    Our models achievean AUC of 0.85-0.95 AUC=0.95 AUC=0.94 AUC=0.85 AUC=0.86 18
  • 19.
    Our models achievean AUC of 0.85-0.95 AUC=0.95 AUC=0.94 AUC=0.85 AUC=0.86 Our models have a good enough fit for interpretation. 19
  • 20.
    Top 1 factor:past speed of answering questions of an answerer Past speed of answering questions of an answerer 20
  • 21.
    A question tendsto receive a fast accepted answer from answerers who previously answered questions fast Probabilityofgetting aslowacceptedanswer Past speed of answering questions of an answerer before (hours in logarithm scale) 21
  • 22.
    A question tendsto receive a fast accepted answer from answerers who previously answered questions fast A wide confidence interval indicates that the relationship is less clear due to the lack of data points in that data range. Probabilityofgetting aslowacceptedanswer Probabilityofgetting aslowacceptedanswer Past speed of answering questions of an answerer before (hours in logarithm scale) 22
  • 23.
    Past speed ofanswering questions of an answerer (hours in logarithm scale) Probabilityofgetting aslowacceptedanswer A question tends to receive a fast accepted answer from answerers who previously answered questions fast 23
  • 24.
    Top 2 factor:length of body of a question Past speed of answering questions of an answerer Length of body of a question 24
  • 25.
    Probabilityofgetting aslowacceptedanswer A long questiontends to receive a slow accepted answer Length of body of a question (characters in logarithmic scale) 25
  • 26.
    Top 3 factor:past speed of getting accepted answers of tags of a question Past speed of answering questions of an answerer Past speed of getting accepted answers of tags of a question Length of body of a question 26
  • 27.
    Probabilityofgetting aslowacceptedanswer A question withtags that received accepted answers fast tends to receive a fast accepted answer Time of getting accepted answers of tags of a question in the past (hours in logarithm scale) 27
  • 28.
    Fast accepted answersrely heavily on the answerer 0 10 20 30 40 50 60 70 Stack Overflow Mathematics Ask Ubuntu Super User %ofexplanatorypower Question Asker Answer Answerer 28
  • 29.
    0 10 20 30 40 50 60 70 Stack Overflow MathematicsAsk Ubuntu Super User %ofexplanatorypower Question Asker Answer Answerer Fast accepted answers rely heavily on the answerer 29
  • 30.
    Suggestions for TechnicalQ&A website designers Deliver questions to the right answerers and motivate them to answer questions faster. 30
  • 31.
    86% - 96%of the accepted answers are posted by answerers that answered more than 5 questions before 31
  • 32.
    • Non-frequent answerers(<= 5 answers) • People that answered no more than 5 answers in the past • Frequent answerers (> 5 answers) • People that answered more than 5 answers in the past Non-frequent answerers vs. Frequent answerers 32
  • 33.
    Non-frequent answerers arethe bottleneck for fast answersMeantimeofposting anacceptedanswer(hours) 33
  • 34.
    34 The current incentivesystem only motivates frequent answerers well, but not non-frequent answerers
  • 35.
    Non-frequent answerers areanswering questions that are as important as ones answered by non-frequent answerers Meanscoreofquestions 35
  • 36.
    Suggestions for TechnicalQ&A website designers Deliver questions to the right answerers and motivate them to answer questions faster. Improve the incentive system to attract the non- frequent answerers to become more active. 36
  • 37.
    Frequent answerers tendto answer shorter questions 37
  • 38.
    Frequent answerers probablygame the incentive system Yeah, some folks are going to specialize in super-fast answers to easy questions and get more rep points than deserved,… The bigger problem is that this has the side effect of causing interesting but more difficult questions to get ignored. Typical example: someone asks a question that gets a lot of views and two or more upvotes, but it's hard enough that no one can answer within an hour or so. 38
  • 39.
    Suggestions for TechnicalQ&A website designers Deliver questions to the right answerers and motivate them to answer questions faster. Improve the incentive system to attract the non- frequent answerers to become more active. Improve the incentive system to factor in the value and difficulty of questions. 39
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.

Editor's Notes

  • #2 hi, thanks for the introduction and for your coming. I am shaowei, a postdoc from queen’s university. Today I will present our paper, which is understanding the factors for faster answers in technique q&a website. This paper is down together with peter from Concordia and Ahmed from queen’s.
  • #3 Developers keep facing problems, whenever they do development, testing, maintenance. Problems fill developers’ life.
  • #4 To help
  • #5 Developers spend 58% of their time on comprehension activities. ~50 million monthly visitors
  • #6 Developers spend 58% of their time on comprehension activities. ~50 million monthly visitors
  • #7 Developers spend 58% of their time on comprehension activities. ~50 million monthly visitors
  • #8 In other words, developers ask questions very frequently.
  • #9 The median waiting time of a question to get answer is 0.5 hour in general. How to shorten the waiting time to get an accepted answer is an interesting question to study.
  • #10 To understand the factors that impact the speed of ,, and provide insights for users and websites designers to improve their system.
  • #11 To achieve this goal, we study four most popular websites in stack exchange network .
  • #12 We select the questions that have at least 1 score, cos we want to make sure the question has enough attention from the community and the quality is reasonable
  • #13 55k from stack overflow
  • #14 To understand the factors that may impact the speed of getting an accepted answer for a question.
  • #16 The reason we only select the top 20% and bottom 20% is that we want to find the factors that really the impact the really and fast questions.
  • #17 Remove oopti and table as well.
  • #18 Inset equation x2
  • #19 Logo and auc
  • #20 Logo and auc
  • #22 Wide gray area means larger confidence interval. the relationship is less clear probability of getting a slow answer increases significantly when the value of A Median Speed Answer increases up until an inflection point with a small confidence interval (i.e., the gray bands are narrow). After the inflection point, the curve goes down gradually but with a wide confidence interval. After the inflection point, the probability goes down slowly with a larger uncertainty (i.e., the relationship is less clear due to the lack of data points in that data range).
  • #23 Wide gray area means larger confidence interval. the relationship is less clear probability of getting a slow answer increases significantly when the value of A Median Speed Answer increases up until an inflection point with a small confidence interval (i.e., the gray bands are narrow). After the inflection point, the curve goes down gradually but with a wide confidence interval. After the inflection point, the probability goes down slowly with a larger uncertainty (i.e., the relationship is less clear due to the lack of data points in that data range).
  • #24 Wide gray area means larger confidence interval. the relationship is less clear probability of getting a slow answer increases significantly when the value of A Median Speed Answer increases up until an inflection point with a small confidence interval (i.e., the gray bands are narrow). After the inflection point, the curve goes down gradually but with a wide confidence interval. After the inflection point, the probability goes down slowly with a larger uncertainty (i.e., the relationship is less clear due to the lack of data points in that data range). More importantly, this finding is hold across the different sites.
  • #25 Speed for an answerer to answer questions in the past Length of an answer body (controlling factor) Length of an question body
  • #26 Wide gray area means larger confidence interval. the relationship is less clear probability of getting a slow answer increases significantly when the value of A Median Speed Answer increases up until an inflection point with a small confidence interval (i.e., the gray bands are narrow). After the inflection point, the curve goes down gradually but with a wide confidence interval. After the inflection point, the probability goes down slowly with a larger uncertainty (i.e., the relationship is less clear due to the lack of data points in that data range).
  • #27 Speed for an answerer to answer questions in the past Length of an answer body (controlling factor) Length of an question body
  • #28 Tag also matters.
  • #29 Logo and number
  • #30 In general, fast accepted answer rely on the people who answer the question.
  • #35 We look at the improvement of reputation score for people with different reputation score.
  • #36 There are non-frequent answers
  • #37 The questions that are answered by non-frequent answerers are as important as these are answered by frequent answerer. However, Non-frequent answerers are the bottleneck for fast answers. So the possible explanation is that some new questions require concert knowledge that only such non-frequent answerers have. such non-frequent answerers do not actively stay on SO, therefore delay the answers.
  • #38 Long title
  • #39 To find the possible reason of this, we explore the posts on stack overflow meta
  • #49 select top 5 * from posts as a join posts as q on q.acceptedanswerid = a.id where DATEDIFF(week, a.creationdate, q.creationdate) > 1
  • #50 Developers spend 58% of their time on comprehension activities.
  • #53 There are non-frequent answers