Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.
To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode.
We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.
Daily Lesson Log in Science 9 Fourth Quarter Physics
EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites
1. EnTagRec: An Enhanced Tag
Recommendation System for
Software Information Sites
Shaowei Wang, David Lo,
Bogdan Vasilescu, Alexander Serebrenik
@b_vasilescu @aserebrenik
2. / department of mathematics and computer science 25-9-2014 Page 2
3. / department of mathematics and computer science 25-9-2014 Page 3
4. / department of mathematics and computer science 25-9-2014 Page 4
5. / department of mathematics and computer science 25-9-2014 Page 5
6. / department of mathematics and computer science 25-9-2014 Page 6
7. ???
/ department of mathematics and computer science 25-9-2014 Page 7
14. I have Java
daemon which I
want to pass
shell commands.
For example… P( )?
/ department of mathematics and computer science 25-9-2014 Page 14
15. Actually (preprocessing…)
P( )?
Java daemon
want pass shell
command
exampl daemon
load configur
possibl
/ department of mathematics and computer science 25-9-2014 Page 15
16. Tags = nouns (phrases)
P( )?
Java daemon
want pass shell
command
exampl daemon
load configur
possibl
/ department of mathematics and computer science 25-9-2014 Page 16
17. P( | Java )
P( |daemon )
P( |shell )
…
Estimate from
the training
data
Combine to
get P for the
entire text
/ department of mathematics and computer science 25-9-2014 Page 17
22. EnTagRec
• is better than BIC and FIC separately
EnTagRec BIC FIC
r@5 0.805 0.565 0.593
p@5 0.346 0.232 0.258
r@5 0.815 0.505 0.637
p@5 0.358 0.212 0.282
r@5 0.88 0.523 0.713
p@5 0.369 0.212 0.298
r@5 0.64 0.391 0.545
p@5 0.382 0.230 0.322
/ department of mathematics and computer science 25-9-2014 Page 22
23. EnTagRec
• is better than BIC and FIC separately
• is better than TagCombine
/ department of mathematics and computer science 25-9-2014 Page 23
24. So, what was
all this
about?
/ department of mathematics and computer science 25-9-2014 Page 24
25. / department of mathematics and computer science 25-9-2014 Page 25
26. / department of mathematics and computer science 25-9-2014 Page 26
27. / department of mathematics and computer science 25-9-2014 Page 27
Editor's Notes
Missing tags.
As tagging is inherently a distributed and uncoordinated process, often similar objects are tagged differently [40]. Idiosyncrasy reduces the usefulness of tags, since related objects are not linked together by a common tag and relevant information becomes more difficult to retrieve. To illustrate the importance of tags for the well functioning of a software information site, note the more than 4000 questions related to tags on the META STACK OVERFLOW, as opposed say to only the about 1000 related to user interface
Some synonyms are really strange
Stress that we have taken samples of these websites. For instance, for StackOverflow we have taken 47K questions with 437 tags…
FreeCode – in addition to textual tags, such as general application domain (e.g., Software Development for Apache Ant) and more specific category within this domain (BuildTools)
Precision and recall: *averages*. Recall: average (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set. Precision, similarly for average (Top5Predicted intersect Correct)/Top5Predicted
BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach. K_Bayesian and K_frequentist are number of tags returned by each of the components (= default 70), Composer Component will return K tags (as in Recall@K, i.e., 5)
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In EMNLP ’09, pages 248–256, 2009. EMNLP = Empirical methods in natural language processing
Tag [python] stresses that tags are not necessarily coming literally from the text.
This is where the weights are coming from
We infer not more than K_Frequentist tags. Indeed, StackOverflow has more than 40 000 tags
Spreading activation network. Each node in the network corresponds to a tag (all tags considered for the training set), and each edge connecting two nodes in the network corresponds to the relationship between the corresponding tags. Weight of a node is initially zero. Weight of an edge is the ratio of the #objects in the training set tagged with both tags and #objects tagged with at least one tag.
Set of starting tags are the tags inferred by the approach presented so far with weights as calculated above. So in this example we assume that in general six tags are considered (Java, Linux. Eclipse, Python, IDE, Project) but for a given software object the previous approach has inferred two tags: Java with probability 0.5 and Python with probability 0.6.
Max hop in this example is 2, and this is why we do not update the weight of Project in the first activation.
BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach.
alpha and beta can be tuned for any evaluation criteria EC, we use recall@5 as used in the literature. Reminder: recall@5 is *average* (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set.
These values are for the optimal values of K, K_Bayesian = K_Frequentist = 70. For both K’s equal to 10 the values of recall are lower but still better than those of BIC/FIC for the StackExchange datasets (recall between 70% and 80%)
r@5 and p@5 values in this table are averages but we have also conducted a more profound study using Dunnett-type contrasts showing that EnTagRec indeed outperforms BIC and FIC and EnTagRec outperforms TagCombine on the StackExchange datasets