Programming errors or exceptions are inherent in software development and maintenance, and given today's Internet era, software developers often look at web for finding working solutions. They make use of a search engine for retrieving relevant pages, and then look for the appropriate solutions by manually going through the pages one by one. However, both the manual checking of a page content against a given exception (and its context) and then working an appropriate solution out are non-trivial tasks. They are even more complex and time-consuming with the bulk of irrelevant (i.e., off-topic) and noisy (e.g., advertisements) content in the web page. In this paper, we propose an IDE-based and context-aware page content recommendation approach that locates and recommends relevant sections of a web page by exploiting the technical details, in particular the context, of an encountered exception in the IDE. A preliminary evaluation with 250 web pages related to 80 programming errors and exceptions and comparison against one existing approach show that the proposed approach is highly promising in terms of precision, recall and F1-measure.
ContentSuggest--Recommendation of Relevant Sections from a Webpage about Errors & Exceptions
1. RECOMMENDING RELEVANT SECTIONS
FROM A WEBPAGE ABOUT
PROGRAMMING ERRORS AND
EXCEPTIONS
Mohammad Masudur Rahman, and Chanchal K. Roy
Software Research Lab, Department of Computer Science
University of Saskatchewan, Canada
25th Center for Advanced Studies Conference (CASCON
2015)
3. SOLVING EXCEPTION
(STEP I: QUERY SELECTION)
3
Selection of traditional search query
Switching to web browser for
web search
This query may not
be sufficient enough
for most of the
exceptions
4. SOLVING EXCEPTION
(STEP II: WEB SEARCH)
4
The browser does NOT know the context
(i.e., details) of the exception.
Not much helpful ranking
Forces the developer to SWITCH back and
forth between IDE and browser.
Trial and error in searching
19% of development time in web search
Switching is
often
distracting
5. SOLVING EXCEPTION
(STEP III: MAPPING TO PAGE SECTIONS )
5
Mapping
Mapping between the exception & relevant
page sections non-trivial
Automated mapping between exception &
relevant page sections
IDE-based web page content suggestion
for review
5
6. OUTLINE OF THIS TALK
6
Content Suggest
Architecture
Metrics & Algorithm
Empirical evaluation &
validation (using
webpages)
Validation with IR techniques
(using SO posts)
Conclusion
13. EXPERIMENTS
13
(80 exceptions +
250 web pages)
Manual analysis
(25 hours)
Gold sections
Evaluation Validation (Sun et al)
Stack Overflow
Crowd
SO Posts
ContentSuggest
IR (VSM, LSA)
14. PERFORMANCE METRICS
Precision (P): % of the retrieved content (a) that
belong to gold content (b) of the page.
Recall (R): % of gold content (b) that is retrieved
(a) by the technique.
F1-measure (F1): Combination of Precision (P) &
Recall (R).
14
||
|),(|
a
baLCS
P
||
|),(|
b
baLCS
R
RP
RP
F
2
1
15. RESEARCH QUESTIONS (4)
RQ1: How effective is ContentSuggest in
recommending relevant content from a web page?
RQ2: How effective are the proposed metrics in
identifying relevant page content?
RQ3: Can ContentSuggest outperform the baseline
technique?
RQ4: Does ContentSuggest perform better than IR
techniques (VSM, LSI) in identifying relevant content?
15
17. ANSWERING RQ3– COMPARISON WITH
BASELINE TECHNIQUE
17
Content Extractor Metric SO-Pages Non-SO Pages All Pages
Sun et al.
(SIGIR 2011)
MP 52.63% 38.89% 44.44%
MR 86.49% 41.84% 59.88%
MF 62.57% 34.49% 45.84%
ContentSuggest
(Proposed Technique)
MP 92.64% 74.60% 81.96%
MR 74.17% 78.51% 76.74%
MF 80.95% 73.09% 76.30%
[ SO = Stack Overflow, MP = Mean Precision, MR = Mean Recall,
MF = Mean F1-measure ]
Performed better for all 3 sets of pages– SO pages, Non-
SO pages, and All Pages
Performed better for all metrics– precision, recall and F-
measure.
21. THREATS TO VALIDITY
Gold content preparation: Despite cross-
validation may contain subjective bias.
Limited training dataset: Metric weights trained
based on limited dataset.
Usability concern: Fully fledged user-study
required to validate the applicability of the
technique. Limited study performed with 6
participants.
21
22. TAKE-HOME MESSAGE
19% of development time spent simply in web
search (Brandt et al, SIGCHI 2009)
Mapping between information in IDE and in web
page could be non-trivial, time-consuming.
ContentSuggest automates such mapping in the
context of exception handling.
Content Density and Content Relevance are
found effective in identifying relevant sections from
a web page.
ContentSuggest outperforms one baseline
technique and two IR techniques (VSM, LSI).
22
24. REFERENCES
[1] J. Brandt, P.J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer. Two Studies
of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing
Code. In Proc. SIGCHI, pages 1589-1598, 2009
[2] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering
traceability links between code and documentation. TSE, 28(10):970-983, 2002
[3] A. Marcus and J.I. Maletic. Recovering Documentation-toSource-Code Traceability
Links Using Latent Semantic Indexing. In Proc. ICSE, pages 125-135, 2003
[4] F. Sun, D. Song, and L. Liao. DOM Based Content Extraction via Text Density. In
Proc. SIGIR, pages 245-254, 2011.
[5] Luca Ponzanelli, Alberto Bacchelli, and Michele Lanza. Seahawk: Stack Overflow
in the IDE. In Proc. ICSE, pages 1295-1298, 2013
[6] M.M Rahman, S. Yeasmin, and C. Roy. Towards a ContextAware IDE-Based Meta
Search Engine for Recommendation about Programming Errors and Exceptions. In
Proc. CSMRWCRE, pages 194-203, 2014
[7] ContentSuggest Web Portal. URL http://www.usask.ca/~mor543/contentsuggest
[8] C.K. Roy and J.R. Cordy. NICAD: Accurate Detection of Near Miss Intentional
Clones Using Flexible Pretty-Printing and Code Normalization. In Proc. ICPC,
pages 172-181, 2008.
24
Editor's Notes
Introduce yourself +introductory statements.
Today, I am going to talk about how to identify and extract relevant sections automatically from a web page for programming errors and exceptions.
Programming exception is a very frequent and a common experience for software programmers or developers.
Once an exception is encountered, the developer identifies the target source line that triggers the exception.
They may do some debugging.
However, often they go for a web search for quick solution.
In case of web search, the first step is query selection.
Often people choose the exception message, the very first line from the stack trace as a search query.
Although it might not be sufficient most of the time.
However, the next step is of course context-switching, they switch from IDE to the web browser.
The second step is performing the web search itself.
However, this search is not often much productive. Studies showed that developers spend about 19% of the development time in web search.
Besides, there are some practical challenges.
The browser does not know the detailed context of the exception in the IDE, thus the returned pages are not much effective.
There is a constant switching between IDE and web browser, which is time-consuming.
However, the most time-consuming step is probably the mapping between problem details in the IDE and the information in web page.
Such mapping is non-trivial.
It also becomes difficult since they are in two different context– IDE and browser, which are not connected.
In this paper, we provide
automation support in such mapping between error details and relevant sections from the web page.
provides the whole support inside the IDE, that resolves the context-switching problem as well.
This is the outline of my today’s talk.
I will first focus on the architecture of our proposed tool– ContentSuggest.
Then we will dive into the proposed metrics and the algorithm for relevant content identification.
Then we perform two types of experiments:-- (1) evaluation using web pages, and (2) evaluation using SO posts.
Then, I will conclude the talk with take-home messages.
This is the system architecture of our tool– ContentSuggest.
Suppose we have a search engine embedded in the IDE, and it returns a list of results for an exception encountered.
Now, a developer wants to explore the search results. So, our process starts when the developer clicks a web page.
Once clicked the page URL is sent to the content extractor module.
The extractor module then collects the page content, analyzes the DOM tree of the page.
It determines content quality and relevance using the exception details from the IDE, and apply different proposed metrics, which we will discuss in a minute.
The different sections of the page is ranked and the top-ranked section in terms of content quality & relevance is returned to the IDE.
The IDE then shows that section.
The idea is developer would check only the most relevant part from a page.
If satisfied, the she can check the whole page for further analysis. This way, she doesn’t need to go through a number of pages all the time.
These are the metrics we used for ranking of different sections from a web page.
We consider content density– that refers to the purity of the content. So, if it is only text, then the content is pure.
But if it contains hyperlinks like ads, widgets, or anything that sounds like noise, then the content is noisy.
We also consider content relevance, that means whether the section in page discusses the relevant exception or not.
The content should contain relevant tokens, method calls or relevant code snippet, may be similar to the code in the IDE.
We finally combine these two aspects--- content density and content relevance to derive content score for each of the sections from the page.
Now, lets take a look into our technique that determines the most relevant section from a given web page.
Suppose, the web page has a structure like this, and our task is to identify the most relevant node.
This is DOM structure of a HTML page, that means a document can be represented as a tree.
First step is to delete the non-text items such as style, script or img tags.
Next step is to determine the content score for each of the nodes from the tree.
These are the direct children of body node.
Now , delete such child of body that falls below a threshold, here we use the score of body as the threshold.
For example, this one is deleted.
Now, we go through each of the remaining child, and look for the maximum score holder.
For example, this TD contains the most pure and the most relevant content for the exception.
Now what our technique does is– mark that TD as content and rest of the siblings as noise.
And this process back tracks up to body node.
Thus, we a get a DOM tree, where each node is either annotated as content or noise.
Here, the bold colored nodes are content nodes.
Now we just keep the content node, and discard the noisy node.
This way, we keep the page structure, but isolate the most pure and the most relevant content for the encountered exception.
This is the most relevant section from the web page.
Now comes the experiments.
We conducted experiments with 250 web pages related to 80 programming exceptions.
The gold set, the most relevant sections from each page, is created from 25 hour manual work, and it is used for both evaluation and validation purposes.
We also conduct experiments with Stack Overflow posts, compare our performance with 2 information retrieval techniques.
We used these three performance metrics for evaluation and validation.
The key idea is here the Longest common subsequence of words.
That means, we collect the word overlap between retrieved content and gold content to derive these metrics.
The same idea was used by earlier techniques from relevant literature.
We try to answer these four research questions from our experiments.
How does our technique perform in general?
How effective are our proposed metrics– content density and content relevance ?
Can our technique perform better than the baseline technique from the literature?
Can it perform better than the established IR techniques such Vector Space Model and Latent Semantic Index?
Here we note that when only content density is considered, the performance is not much interesting.
Content relevance also does not work well alone.
However, when both metrics are combined, the statistics are interesting.
We get about 82% precision, 77% recall and about 76% F1-measure, which are promising according to literature.
We compared with a closely-related technique from literature.
We also divide the dataset into different sets, and conduct experiments.
Our technique performs better for all set and all metrics than the baseline technique.
One possible explanation for this performance is probably our content relevance paradigm.
We also used the box plot to analyze the comparison results.
We note that our performance measures are undoubtedly higher than the competing technique.
Our measures have less variance, the medians are close to 90% +
On the other hand, the baseline technique has relatively lower measures.
Since the gold set in the first experiment is manually developed, it might contains some subjective bias.
We thus also conducted experiments using Stack Overflow posts, where we try to find out the accepted solution as well as the top-voted answer from the page.
Then we also compare with two established IR techniques for the same task.
We found that our technique performs relatively better than those 2 techniques.
Our technique performs better especially in recall, and therefore in F-measure metrics.
We do more investigation using the box plot.
Here, we see our measures have relatively more variance, but better performance in recall and f-measure.
Especially mapping such post from a SO page is a real challenge since it involves a lot of factors. However, our technique is found relatively promising compared to the traditional alternatives we have so far.
We identified three threats to the validity of our findings.
Gold set might contain some subjective bias, and its hard to remove them properly. However, we performed second experiment to handle the threat.
The usability of the technique is properly evaluated yet. However, we did that evaluation in a limited scale.
So, these are take-home messages.
Developers spend about 19% of their time for web search, and mapping information from IDE to web browser can be a real challenge.
Our technique addresses that concern, and maps exception information from IDE to a web page.
We consider purity and relevance of the content for extracting the most relevant sections from a page.
Our technique performs better than a baseline technique and 2 IR techniques.
The tool can be found online for testing.