How are project-specific forums utilized?
A study of participation, content, and sentiment
in the Eclipse ecosystem
Published online in the Journal of Empirical Software Engineering
29 September 2021
Yusuf S. Nugroho Syful Islam Keitaro Nakasai Ifraz Rehman Hideaki Hata Raula G. Kula Mei Nagappan Kenichi Matsumoto
Forums support mass communication and coordination among
distributed software development contributors (Storey et al., 2017)
2
Senior software
developer
Software analyst
Junior software
developer
Software users
Project owner
Such communication channels
seem to have similarities to a forum
3
(Guzzi et al. 2013;
Zagalsky et al. 2018)
(Guzman et al. 2016; Mezouar et al. 2018)
News Aggregator
(Aniche et al. 2018)
Generic Q&A platforms such as Stack Overflow (SO)
have taken over the roles of forums
4
• Many software development projects had closed their self-supported forums and moved to Stack
Overflow due to its large-scale community scope and benefits (Squire, 2015)
• Gamification strategies such as awarding of badges or a voting system of Stack Overflow are considered
to be incentives designed to participate and improve answer quality.
On the other hand, Eclipse still maintains
an established and active forum
5
The Eclipse Foundation was created in January 2004
Research Questions
• RQ1: How does participation differ according to
membership statuses?
• RQ2: How do communication content vary for different
membership statuses?
• RQ3: What are the differences perceived in the sentiment
of interaction based on membership statuses?
6
Connected data from Gerrit, Bugzilla, Forums, and
Projects via REST API in the Eclipse ecosystem
7
Results of data extraction
Data source:
https://www.eclipse.org/forums/
Step of Data Collection
8
Step # Threads
Step 1: Raw extraction 1,097,174
Step 2: Remove duplication 832,058
Step 3: Separation:
(i) Threads by webmaster 542,997
(ii) Threads by non-webmaster 289,061
Preliminary Study:
Membership Classification in Eclipse Forums
user
Juniors
Members
Seniors
9
Senior members tend to use the forum
more frequently
10
RQ1: How does participation differ according to
membership statuses?
1. RQ1.1: Shape of contributors’ participation
• We use a Topological Data Analysis (TDA) technique
(Lum et al. 2013)
2. RQ1.2: Membership-based participation
• We divided the type of threads into two categories,
(i) bug-related threads, and (ii) non-bug-related threads.
11
RQ1.1:
Shape of contributors’ participation
Forums
Forums Forums
(a) color activity by thread (Forums) (b) color activity by bug submission (Bugzilla)
(c) color activity by review submission (Gerrit) (d) color activity by committer (Projects)
= lower rate
= higher rate
Legend
12
RQ1.1:
Shape of contributors’ participation
Forums
Forums Forums
(a) color activity by thread (Forums) (b) color activity by bug submission (Bugzilla)
(c) color activity by review submission (Gerrit) (d) color activity by committer (Projects)
= lower rate
= higher rate
Legend
13
Summary:
We find that active contributors are also active in forums,
making it a source of expert knowledge for all systems
RQ1.2:
Membership-based participation
Askers
Answerers
Bug-related Threads (%)
Sum (%)
Non-bug-related Threads (%)
Sum (%)
Juniors Members Seniors Juniors Members Seniors
Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3
Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2
Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5
Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0
14
RQ1.2:
Membership-based participation
Askers
Answerers
Bug-related Threads (%)
Sum (%)
Non-bug-related Threads (%)
Sum (%)
Juniors Members Seniors Juniors Members Seniors
Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3
Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2
Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5
Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0
15
• Compared to Juniors and Members, Seniors are the most
active forum users in responding the threads.
RQ1.2:
Membership-based participation
Askers
Answerers
Bug-related Threads (%)
Sum (%)
Non-bug-related Threads (%)
Sum (%)
Juniors Members Seniors Juniors Members Seniors
Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3
Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2
Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5
Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0
16
• For bug-related threads, Seniors are contributing way more
than Juniors and Members.
RQ1.2:
Membership-based participation
Askers
Answerers
Bug-related Threads (%)
Sum (%)
Non-bug-related Threads (%)
Sum (%)
Juniors Members Seniors Juniors Members Seniors
Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3
Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2
Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5
Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0
17
• For non-bug related threads, Juniors responded more to Juniors’ initial
posts, and Seniors responded more to Seniors’ initial posts.
RQ2: How do communication content vary
for different membership statuses?
1. Threads by Organization & Threads by Users
• Classify the messages posted by organization (“Eclipse Webmaster”)
• Classify the messages posted by users based on the forum names and
topic categories
2. Forums linkage
• Manually characterize the link targets using coding guide by Hata et al.
(2019)
3. Membership-based content
• Manually extract the content types using coding guide by Beyer et
al. (2020), Treude et al. (2011)
18
RQ2.1: Threads by Organization & Users
58%
29%
3% 1% 1% 1% 1% 0%
6%
0%
20%
40%
60%
Webmaster’s message pattern
0
10,000
20,000
30,000
Top 10 topics in Eclipse forum
• Reply, welcome, and closure
messages are the most frequently
messages posted by webmaster;
• Eclipse Platform, BIRT, and forums
for Newcomers are the most active
topics discussed by users;
19
RQ2.2: Forums Linkage
15%
11%
10%
7%
5%
0%
5%
10%
15%
Bug report Other
documentation
Personal/
Organizational
homepage
Product/ project
homepage
API
documentation
Top 5 link target types in the sample
20
RQ2.3: Membership-based content
39.9
30
15.3
8.1
13.1
16.2
1.6
4.4
11.9
0.8
1.3
5.7
JUNIORS
MEMBERS
SENIORS
THREADS (%)
Discrepancy Conceptual Information Announcement
21
RQ3: What are the differences perceived in
the sentiment of interaction based on membership statuses?
22
Initial posts
and first replies
Coding guide
(Kauffeld and Lehmann-Willenbrock, 2012)
Analysis
Polarity Social interaction
RQ3.1: Polarity Analysis
Juniors Members Seniors
23
Juniors and Members post more positive, Seniors post more neutral
RQ3.2: Social Interaction
• PPc = Positive procedural
• PS = Positive socioemotional
• PPa = Positive proactive
• NP = Negative procedural
• NS = Negative socioemotional
• NC = Negative counteractive
24
Users are encouraged to post positive procedural threads
to get more positive responses
Conclusion
Our empirical study has shown that:
• Forums are an essential platform for linking various
resources in Eclipse ecosystem and a source of expert
knowledge for other development systems.
• The results also reveal that forum members actively
participate in posting and responding to threads without
limitation of membership.
• The forums are dominated by question and answer threads,
with the status of users affecting the contents.
• Sentiment of interaction among developers are
inconsistent.
25
Recommendation:
for Forum Users
• Join project-specific forum discussions, as they play a role as the
center of knowledge sharing platform. The users will not only
receive knowledge from community, but also from external sources.
• Identify appropriate forum topic. Sharing problems in a compatible
topic category would maximize the probability to get responses and
solutions. Otherwise, the questions might be marked out of scope.
• Share knowledge in positive manner. Since the clarity of knowledge
shared in the forums might increase the chances to get responses,
describing the problems in a procedural way is essential.
26
Recommendation:
for software development projects
• Consider preparing project-specific forums
• Eclipse forums is not only limited to question-and-answer
based content, but also enable developers to provide
Eclipse community-related information or announcement.
• GitHub Discussions, could be an option for projects hosted
on GitHub (Hata et al. 2022).
27
Discussion
Publications
• Nugroho, Y.S., Islam, S., Nakasai, K. et
al. How are project-specific forums
utilized? A study of participation,
content, and sentiment in the Eclipse
ecosystem. Empirical Software
Engineering 26, 132 (2021).
• Published online on 29 September
2021
• Available at:
https://doi.org/10.1007/s10664-021-
10032-2
28
Yusuf Sulistyo Nugroho
yusuf.nugroho@ums.ac.id
29

How are project-specific forums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem

  • 1.
    How are project-specificforums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem Published online in the Journal of Empirical Software Engineering 29 September 2021 Yusuf S. Nugroho Syful Islam Keitaro Nakasai Ifraz Rehman Hideaki Hata Raula G. Kula Mei Nagappan Kenichi Matsumoto
  • 2.
    Forums support masscommunication and coordination among distributed software development contributors (Storey et al., 2017) 2 Senior software developer Software analyst Junior software developer Software users Project owner
  • 3.
    Such communication channels seemto have similarities to a forum 3 (Guzzi et al. 2013; Zagalsky et al. 2018) (Guzman et al. 2016; Mezouar et al. 2018) News Aggregator (Aniche et al. 2018)
  • 4.
    Generic Q&A platformssuch as Stack Overflow (SO) have taken over the roles of forums 4 • Many software development projects had closed their self-supported forums and moved to Stack Overflow due to its large-scale community scope and benefits (Squire, 2015) • Gamification strategies such as awarding of badges or a voting system of Stack Overflow are considered to be incentives designed to participate and improve answer quality.
  • 5.
    On the otherhand, Eclipse still maintains an established and active forum 5 The Eclipse Foundation was created in January 2004
  • 6.
    Research Questions • RQ1:How does participation differ according to membership statuses? • RQ2: How do communication content vary for different membership statuses? • RQ3: What are the differences perceived in the sentiment of interaction based on membership statuses? 6
  • 7.
    Connected data fromGerrit, Bugzilla, Forums, and Projects via REST API in the Eclipse ecosystem 7
  • 8.
    Results of dataextraction Data source: https://www.eclipse.org/forums/ Step of Data Collection 8 Step # Threads Step 1: Raw extraction 1,097,174 Step 2: Remove duplication 832,058 Step 3: Separation: (i) Threads by webmaster 542,997 (ii) Threads by non-webmaster 289,061
  • 9.
    Preliminary Study: Membership Classificationin Eclipse Forums user Juniors Members Seniors 9
  • 10.
    Senior members tendto use the forum more frequently 10
  • 11.
    RQ1: How doesparticipation differ according to membership statuses? 1. RQ1.1: Shape of contributors’ participation • We use a Topological Data Analysis (TDA) technique (Lum et al. 2013) 2. RQ1.2: Membership-based participation • We divided the type of threads into two categories, (i) bug-related threads, and (ii) non-bug-related threads. 11
  • 12.
    RQ1.1: Shape of contributors’participation Forums Forums Forums (a) color activity by thread (Forums) (b) color activity by bug submission (Bugzilla) (c) color activity by review submission (Gerrit) (d) color activity by committer (Projects) = lower rate = higher rate Legend 12
  • 13.
    RQ1.1: Shape of contributors’participation Forums Forums Forums (a) color activity by thread (Forums) (b) color activity by bug submission (Bugzilla) (c) color activity by review submission (Gerrit) (d) color activity by committer (Projects) = lower rate = higher rate Legend 13 Summary: We find that active contributors are also active in forums, making it a source of expert knowledge for all systems
  • 14.
    RQ1.2: Membership-based participation Askers Answerers Bug-related Threads(%) Sum (%) Non-bug-related Threads (%) Sum (%) Juniors Members Seniors Juniors Members Seniors Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3 Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2 Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5 Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0 14
  • 15.
    RQ1.2: Membership-based participation Askers Answerers Bug-related Threads(%) Sum (%) Non-bug-related Threads (%) Sum (%) Juniors Members Seniors Juniors Members Seniors Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3 Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2 Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5 Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0 15 • Compared to Juniors and Members, Seniors are the most active forum users in responding the threads.
  • 16.
    RQ1.2: Membership-based participation Askers Answerers Bug-related Threads(%) Sum (%) Non-bug-related Threads (%) Sum (%) Juniors Members Seniors Juniors Members Seniors Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3 Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2 Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5 Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0 16 • For bug-related threads, Seniors are contributing way more than Juniors and Members.
  • 17.
    RQ1.2: Membership-based participation Askers Answerers Bug-related Threads(%) Sum (%) Non-bug-related Threads (%) Sum (%) Juniors Members Seniors Juniors Members Seniors Juniors 15.4 2.7 1.3 19.4 30.3 3.9 5.2 39.3 Members 11.7 1.7 1.0 14.5 6.9 5.0 3.2 15.2 Seniors 51.1 10.6 4.4 66.1 19.2 5.8 20.5 45.5 Sum 78.8 15.1 6.8 100.0 56.4 14.7 28.9 100.0 17 • For non-bug related threads, Juniors responded more to Juniors’ initial posts, and Seniors responded more to Seniors’ initial posts.
  • 18.
    RQ2: How docommunication content vary for different membership statuses? 1. Threads by Organization & Threads by Users • Classify the messages posted by organization (“Eclipse Webmaster”) • Classify the messages posted by users based on the forum names and topic categories 2. Forums linkage • Manually characterize the link targets using coding guide by Hata et al. (2019) 3. Membership-based content • Manually extract the content types using coding guide by Beyer et al. (2020), Treude et al. (2011) 18
  • 19.
    RQ2.1: Threads byOrganization & Users 58% 29% 3% 1% 1% 1% 1% 0% 6% 0% 20% 40% 60% Webmaster’s message pattern 0 10,000 20,000 30,000 Top 10 topics in Eclipse forum • Reply, welcome, and closure messages are the most frequently messages posted by webmaster; • Eclipse Platform, BIRT, and forums for Newcomers are the most active topics discussed by users; 19
  • 20.
    RQ2.2: Forums Linkage 15% 11% 10% 7% 5% 0% 5% 10% 15% Bugreport Other documentation Personal/ Organizational homepage Product/ project homepage API documentation Top 5 link target types in the sample 20
  • 21.
  • 22.
    RQ3: What arethe differences perceived in the sentiment of interaction based on membership statuses? 22 Initial posts and first replies Coding guide (Kauffeld and Lehmann-Willenbrock, 2012) Analysis Polarity Social interaction
  • 23.
    RQ3.1: Polarity Analysis JuniorsMembers Seniors 23 Juniors and Members post more positive, Seniors post more neutral
  • 24.
    RQ3.2: Social Interaction •PPc = Positive procedural • PS = Positive socioemotional • PPa = Positive proactive • NP = Negative procedural • NS = Negative socioemotional • NC = Negative counteractive 24 Users are encouraged to post positive procedural threads to get more positive responses
  • 25.
    Conclusion Our empirical studyhas shown that: • Forums are an essential platform for linking various resources in Eclipse ecosystem and a source of expert knowledge for other development systems. • The results also reveal that forum members actively participate in posting and responding to threads without limitation of membership. • The forums are dominated by question and answer threads, with the status of users affecting the contents. • Sentiment of interaction among developers are inconsistent. 25
  • 26.
    Recommendation: for Forum Users •Join project-specific forum discussions, as they play a role as the center of knowledge sharing platform. The users will not only receive knowledge from community, but also from external sources. • Identify appropriate forum topic. Sharing problems in a compatible topic category would maximize the probability to get responses and solutions. Otherwise, the questions might be marked out of scope. • Share knowledge in positive manner. Since the clarity of knowledge shared in the forums might increase the chances to get responses, describing the problems in a procedural way is essential. 26
  • 27.
    Recommendation: for software developmentprojects • Consider preparing project-specific forums • Eclipse forums is not only limited to question-and-answer based content, but also enable developers to provide Eclipse community-related information or announcement. • GitHub Discussions, could be an option for projects hosted on GitHub (Hata et al. 2022). 27 Discussion
  • 28.
    Publications • Nugroho, Y.S.,Islam, S., Nakasai, K. et al. How are project-specific forums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem. Empirical Software Engineering 26, 132 (2021). • Published online on 29 September 2021 • Available at: https://doi.org/10.1007/s10664-021- 10032-2 28
  • 29.

Editor's Notes

  • #2 Hello everyone, this is Yusuf Sulistyo Nugroho, from Universitas Muhammadiyah Surakarta, Indonesia. On behalf all authors, in this opportunity, I would like to present my research entitled “How are project-specific forums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem”
  • #3 So, what’s the background of this work? As stated in prior work by Storey et al. (2017), forums support mass communication and coordination among distributed software development contributors, such as senior and junior developers, project owners, users, or even software analysts.
  • #4 To bridge their discussion, they utilize communication channels that seems to have similarities to an online forum, such as: - Mailing list - Microblog - News aggregator etc. However, recent studies shows that generic Question-and-Answer platforms, such as Stack Overflow, have taken over the roles of forums.
  • #5 However, recent studies shows that generic Question-and-Answer platforms, such as Stack Overflow, have taken over the roles of forums. Many software development projects had closed their self-supported forums and moved to Stack Overflow due to its large-scale community scope and benefits.
  • #6 On the other hand, Eclipse that was created in January 2004, still maintains an established and active forum. Although many software development projects have moved their developer discussion forums to generic platforms such as Stack Overflow, Eclipse has been steadfast in hosting their self-supported community forums. While recent studies show forums share similarities to generic communication channels, it is unknown how project-specific forums like Eclipse forums are utilized. Based on this motivation, we formulate 3 research questions, as follows:
  • #8 In this study, we used the 4 sources that connect to the User Profile in the Eclipse ecosystem, that are Gerrit, Bugzilla, Forums, and Projects User profile links to Projects through uid User profile links to Gerrit through email User profile links to Bugzilla through email User profile links to Forum through name / email
  • #9 This is one example of a forum thread that we used in this study. In the first step, we have collected more than 1 million raw data. From these data, we then removed the duplicated threads. Since we would like to investigate the characteristics of threads posted by webmaster and the users, we separate the threads. Eclipse webmaster is a person or automated system who is in charge to post a message on behalf organization. This step produces around 540 thousands webmaster threads and around 280 thousands threads by non-webmaster.
  • #10 Since Eclipse distinguishes their general users based on the membership statuses, we classified the users into 3 types: Junior member Member Senior member The user status may changed from the Juniors to Members, and Members to the Seniors, depends on the contributions of the users in the community. To investigate further, we identified the messages posted by each member.
  • #11 Based on our investigation, we found that senior members have used the forum more often than the other types of member. From this finding, we then analyze further regarding the participation, which have been formulated in research question 1.
  • #12 To answer research question 1, we studied the shape of contributors’ participation and the participation based on the membership.
  • #13 In RQ1.1, we investigated how contributors use multiple systems by using a TDA technique. From the results, as can be seen in the figures, we found that active contributors are likely to be also active in forums, making it a source of expert knowledge for all systems.
  • #14 In RQ1.1, we investigated how contributors use multiple systems by using a TDA technique. From the results, as can be seen in the figures, we found that active contributors are likely to be also active in forums, making it a source of expert knowledge for all systems.
  • #15 Since Eclipse forum is one of ecosystem specific channels, we also analyze the users participation based on membership that is formulated in RQ1.2
  • #16 The results show that, in general, seniors are the most active forum users in responding the threads, compared to Juniors and Members.
  • #17 In addition, Seniors have also contributed more than Juniors and Members in the bug-related discussion.
  • #18 While in the non-bug related discussion, Juniors have responded more to Juniors, and Seniors have responded more to Seniors’ initial posts.
  • #19 In the second research question, we studied the content of threads based on the membership statuses. In detail, we classified the messages posted by organization, as well as the users. Furthermore, we also manually investigated the forum linkage and the membership-based content types
  • #20 The findings show that Reply, welcome, and closure messages are the most frequently messages posted by webmaster; While Eclipse Platform, BIRT, and forums for Newcomers are the most active topics discussed by users; BIRT (Business Intelligence and Reporting Tools) EMF (Eclipse Modeling Framework) C / C++ IDE (CDT) Java Development Tools (JDT) Standard Widget Toolkit (SWT) Rich Client Platform (RCP) TMF (Textual Modeling Framework) (Xtext) GMF (Graphical Modeling Framework)
  • #21 It is also common in forum discussions to reference to other resources, like bug reports, documentation, etc., which indicates that forums are essential platform for linking various resources in Eclipse ecosystem.
  • #22 To find out the type of discussion, we categorize the threads through a manual analysis of a representative sample. In this analysis, 3 authors are involved. We found that discrepancy type of discussion are mostly posted by Juniors and Members. In contrast, Seniors tend to discuss more conceptual topics. This show that the higher the status, the more conceptual the content type that is communicated in Eclipse forums.
  • #23 In the Research question 3, we manually analyze the differences perceived in the sentiment of interaction based on membership statuses. To answer this question, we only focus on the initial posts and their first replies. This is because the first reply in each thread shows the tangible feeling of a user reaction in responding to the initial posts.  By using the coding guide, we manually annotate the messages to classify the type of polarity and interactions.
  • #24 The polarity of the main posts and first responses among developers shows that Juniors and Members post more positive messages than Seniors. On the other hand, Seniors tend to post more neutral messages compared to the other two types of users.
  • #25 While the result of the interaction sentiment analysis shows that, procedural posts tend to receive more responses than other types of posts. This is indicated by positive and negative procedural messages that receive the highest number of first responses. Furthermore, posting a message in a positive procedural way has higher chances of getting positive replies. In contrast, negative responses are frequently addressed to negative procedural messages. In sum, the results hint that users are encouraged to post procedural threads and thus increasing the probability of receiving answers.
  • #27 So, based on the findings of this work, we recommend the forum users to: Join project-specific forum discussions Identify appropriate forum topic, and Share knowledge in positive manner
  • #28 For Software development projects, we recommend them to consider preparing a project-specific discussion forum. For example, GitHub Discussions, could be an option for projects hosted on GitHub
  • #29 To read the details of our study, please read our paper on the available link or you may scan the QR code available on this slide.
  • #30 That’s all I have in my presentation. Thank you for listening.