Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RECOMMENDING RELEVANT SECTIONS
FROM A WEBPAGE ABOUT
PROGRAMMING ERRORS AND
EXCEPTIONS
Mohammad Masudur Rahman, and Chancha...
2
Exception triggering point
SOLVING EXCEPTION
(STEP I: QUERY SELECTION)
3
Selection of traditional search query
Switching to web browser for
web searc...
SOLVING EXCEPTION
(STEP II: WEB SEARCH)
4
 The browser does NOT know the context
(i.e., details) of the exception.
 Not ...
SOLVING EXCEPTION
(STEP III: MAPPING TO PAGE SECTIONS )
5
Mapping
 Mapping between the exception & relevant
page sections...
OUTLINE OF THIS TALK
6
Content Suggest
Architecture
Metrics & Algorithm
Empirical evaluation &
validation (using
webpages)...
CONTENTSUGGEST—ARCHITECTURE
7
Start End
7
PROPOSED METRICS
 Content Density (CTD)
 Text Density (TD)
 Link Density (LD)
 Code Density (CD)
 Purity of textual c...
PROPOSED TECHNIQUE (CONTENTSUGGEST)
HTML
HEAD
BODY
TITLE
STYLE
SCRIPT
DIV DIV
H1 P
B
OL
LI
LI
LI
H1 TABLE P
TBODY
TR
TR
TD...
PROPOSED TECHNIQUE (CONTENTSUGGEST)-
-SCORING
HTML
HEAD
BODY
TITLE
DIV DIV
H1 P
B
OL
LI
LI
LI
H1 TABLE P
TBODY
TR
TR
TD
TD...
PROPOSED TECHNIQUE (CONTENTSUGGEST)-
-TAGGING
HTML
HEAD
BODY
TITLE
DIV DIV
H1 P
B
OL
LI
LI
LI
H1 TABLE P
TBODY
TR
TR
TD
TD...
PROPOSED TECHNIQUE (CONTENTSUGGEST)-
-FILTERING
HTML
HEAD
BODY
TITLE
DIV DIV
H1 P
B
OL
LI
LI
LI
H1 TABLE P
TBODY
TR
TR
TD
...
EXPERIMENTS
13
(80 exceptions +
250 web pages)
Manual analysis
(25 hours)
Gold sections
Evaluation Validation (Sun et al)
...
PERFORMANCE METRICS
 Precision (P): % of the retrieved content (a) that
belong to gold content (b) of the page.
 Recall ...
RESEARCH QUESTIONS (4)
 RQ1: How effective is ContentSuggest in
recommending relevant content from a web page?
 RQ2: How...
ANSWERING RQ1 & RQ2– EVALUATION OF
TECHNIQUE & METRICS
16
Scores Metric SO-Pages Non-SO Pages All Pages
{ Content Density ...
ANSWERING RQ3– COMPARISON WITH
BASELINE TECHNIQUE
17
Content Extractor Metric SO-Pages Non-SO Pages All Pages
Sun et al.
(...
ANSWERING RQ3– COMPARISON WITH
BASELINE TECHNIQUE
18
ANSWERING RQ4– COMPARISON WITH IR
TECHNIQUES (VSM, LSI)
19
Content Extractor Metric Accepted Posts Most Voted Posts
Latent...
ANSWERING RQ4– COMPARISON WITH IR
TECHNIQUES (VSM, LSI)
20
THREATS TO VALIDITY
 Gold content preparation: Despite cross-
validation may contain subjective bias.
 Limited training ...
TAKE-HOME MESSAGE
 19% of development time spent simply in web
search (Brandt et al, SIGCHI 2009)
 Mapping between infor...
THANK YOU!!
23
REFERENCES
[1] J. Brandt, P.J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer. Two Studies
of Opportunistic Programmi...
Upcoming SlideShare
Loading in …5
×

ContentSuggest--Recommendation of Relevant Sections from a Webpage about Errors & Exceptions

200 views

Published on

Programming errors or exceptions are inherent in software development and maintenance, and given today's Internet era, software developers often look at web for finding working solutions. They make use of a search engine for retrieving relevant pages, and then look for the appropriate solutions by manually going through the pages one by one. However, both the manual checking of a page content against a given exception (and its context) and then working an appropriate solution out are non-trivial tasks. They are even more complex and time-consuming with the bulk of irrelevant (i.e., off-topic) and noisy (e.g., advertisements) content in the web page. In this paper, we propose an IDE-based and context-aware page content recommendation approach that locates and recommends relevant sections of a web page by exploiting the technical details, in particular the context, of an encountered exception in the IDE. A preliminary evaluation with 250 web pages related to 80 programming errors and exceptions and comparison against one existing approach show that the proposed approach is highly promising in terms of precision, recall and F1-measure.

Published in: Education
  • Be the first to comment

  • Be the first to like this

ContentSuggest--Recommendation of Relevant Sections from a Webpage about Errors & Exceptions

  1. 1. RECOMMENDING RELEVANT SECTIONS FROM A WEBPAGE ABOUT PROGRAMMING ERRORS AND EXCEPTIONS Mohammad Masudur Rahman, and Chanchal K. Roy Software Research Lab, Department of Computer Science University of Saskatchewan, Canada 25th Center for Advanced Studies Conference (CASCON 2015)
  2. 2. 2 Exception triggering point
  3. 3. SOLVING EXCEPTION (STEP I: QUERY SELECTION) 3 Selection of traditional search query Switching to web browser for web search This query may not be sufficient enough for most of the exceptions
  4. 4. SOLVING EXCEPTION (STEP II: WEB SEARCH) 4  The browser does NOT know the context (i.e., details) of the exception.  Not much helpful ranking  Forces the developer to SWITCH back and forth between IDE and browser.  Trial and error in searching  19% of development time in web search Switching is often distracting
  5. 5. SOLVING EXCEPTION (STEP III: MAPPING TO PAGE SECTIONS ) 5 Mapping  Mapping between the exception & relevant page sections non-trivial  Automated mapping between exception & relevant page sections  IDE-based web page content suggestion for review 5
  6. 6. OUTLINE OF THIS TALK 6 Content Suggest Architecture Metrics & Algorithm Empirical evaluation & validation (using webpages) Validation with IR techniques (using SO posts) Conclusion
  7. 7. CONTENTSUGGEST—ARCHITECTURE 7 Start End 7
  8. 8. PROPOSED METRICS  Content Density (CTD)  Text Density (TD)  Link Density (LD)  Code Density (CD)  Purity of textual content, less hyperlinks  Content Relevance (CTR)  Text Relevance (TR)  Code Relevance (CR)  Relevance of textual content with exception, interesting tokens  Content Score (CTS) = γ*Content Density + δ*Content Relevance  Normalized metrics 8
  9. 9. PROPOSED TECHNIQUE (CONTENTSUGGEST) HTML HEAD BODY TITLE STYLE SCRIPT DIV DIV H1 P B OL LI LI LI H1 TABLE P TBODY TR TR TD TD TD TD Text Text Text Text Text Text Text Text Text Text Text Text Text DIV P P 9
  10. 10. PROPOSED TECHNIQUE (CONTENTSUGGEST)- -SCORING HTML HEAD BODY TITLE DIV DIV H1 P B OL LI LI LI H1 TABLE P TBODY TR TR TD TD TD TD Text Text Text Text Text Text Text Text Text Text Text Text Text 10
  11. 11. PROPOSED TECHNIQUE (CONTENTSUGGEST)- -TAGGING HTML HEAD BODY TITLE DIV DIV H1 P B OL LI LI LI H1 TABLE P TBODY TR TR TD TD TD TD Text Text Text Text Text Text Text Text Text Text Text Text Text 11 Content Noise
  12. 12. PROPOSED TECHNIQUE (CONTENTSUGGEST)- -FILTERING HTML HEAD BODY TITLE DIV DIV H1 P B OL LI LI LI H1 TABLE P TBODY TR TR TD TD TD TD Text Text Text Text Text Text Text Text Text Text Text Text Text 12 Content Noise
  13. 13. EXPERIMENTS 13 (80 exceptions + 250 web pages) Manual analysis (25 hours) Gold sections Evaluation Validation (Sun et al) Stack Overflow Crowd SO Posts ContentSuggest IR (VSM, LSA)
  14. 14. PERFORMANCE METRICS  Precision (P): % of the retrieved content (a) that belong to gold content (b) of the page.  Recall (R): % of gold content (b) that is retrieved (a) by the technique.  F1-measure (F1): Combination of Precision (P) & Recall (R). 14 || |),(| a baLCS P  || |),(| b baLCS R  RP RP F    2 1
  15. 15. RESEARCH QUESTIONS (4)  RQ1: How effective is ContentSuggest in recommending relevant content from a web page?  RQ2: How effective are the proposed metrics in identifying relevant page content?  RQ3: Can ContentSuggest outperform the baseline technique?  RQ4: Does ContentSuggest perform better than IR techniques (VSM, LSI) in identifying relevant content? 15
  16. 16. ANSWERING RQ1 & RQ2– EVALUATION OF TECHNIQUE & METRICS 16 Scores Metric SO-Pages Non-SO Pages All Pages { Content Density } MP 50.91% 49.50% 50.07% MR 91.74% 75.71% 82.18% MF 62.32% 53.76% 57.22% { Content Relevance } MP 86.63% 69.17% 76.23% MR 52.17% 57.66% 55.44% MF 61.07% 55.88% 57.98% { Content Density, Content Relevance } (Proposed Technique) MP 92.64% 74.60% 81.96% MR 74.17% 78.51% 76.74% MF 80.95% 73.09% 76.30% [ SO = Stack Overflow, MP = Mean Precision, MR = Mean Recall, MF = Mean F1-measure ]
  17. 17. ANSWERING RQ3– COMPARISON WITH BASELINE TECHNIQUE 17 Content Extractor Metric SO-Pages Non-SO Pages All Pages Sun et al. (SIGIR 2011) MP 52.63% 38.89% 44.44% MR 86.49% 41.84% 59.88% MF 62.57% 34.49% 45.84% ContentSuggest (Proposed Technique) MP 92.64% 74.60% 81.96% MR 74.17% 78.51% 76.74% MF 80.95% 73.09% 76.30% [ SO = Stack Overflow, MP = Mean Precision, MR = Mean Recall, MF = Mean F1-measure ]  Performed better for all 3 sets of pages– SO pages, Non- SO pages, and All Pages  Performed better for all metrics– precision, recall and F- measure.
  18. 18. ANSWERING RQ3– COMPARISON WITH BASELINE TECHNIQUE 18
  19. 19. ANSWERING RQ4– COMPARISON WITH IR TECHNIQUES (VSM, LSI) 19 Content Extractor Metric Accepted Posts Most Voted Posts Latent Semantic Analysis (Marcus et al, ICSE 2003) MP 19.98% 23.02% MR 21.78% 23.17% MF 18.43% 21.07% Vector Space Model (Antoniol et al, TSE 2002) MP 22.50% 33.89% MR 23.08% 31.90% MF 19.77% 30.44% Content Suggest (Proposed Technique) MP 23.10% 31.36% MR 45.15% 54.42% MF 26.99% 35.90%
  20. 20. ANSWERING RQ4– COMPARISON WITH IR TECHNIQUES (VSM, LSI) 20
  21. 21. THREATS TO VALIDITY  Gold content preparation: Despite cross- validation may contain subjective bias.  Limited training dataset: Metric weights trained based on limited dataset.  Usability concern: Fully fledged user-study required to validate the applicability of the technique. Limited study performed with 6 participants. 21
  22. 22. TAKE-HOME MESSAGE  19% of development time spent simply in web search (Brandt et al, SIGCHI 2009)  Mapping between information in IDE and in web page could be non-trivial, time-consuming.  ContentSuggest automates such mapping in the context of exception handling.  Content Density and Content Relevance are found effective in identifying relevant sections from a web page.  ContentSuggest outperforms one baseline technique and two IR techniques (VSM, LSI). 22
  23. 23. THANK YOU!! 23
  24. 24. REFERENCES [1] J. Brandt, P.J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proc. SIGCHI, pages 1589-1598, 2009 [2] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. TSE, 28(10):970-983, 2002 [3] A. Marcus and J.I. Maletic. Recovering Documentation-toSource-Code Traceability Links Using Latent Semantic Indexing. In Proc. ICSE, pages 125-135, 2003 [4] F. Sun, D. Song, and L. Liao. DOM Based Content Extraction via Text Density. In Proc. SIGIR, pages 245-254, 2011. [5] Luca Ponzanelli, Alberto Bacchelli, and Michele Lanza. Seahawk: Stack Overflow in the IDE. In Proc. ICSE, pages 1295-1298, 2013 [6] M.M Rahman, S. Yeasmin, and C. Roy. Towards a ContextAware IDE-Based Meta Search Engine for Recommendation about Programming Errors and Exceptions. In Proc. CSMRWCRE, pages 194-203, 2014 [7] ContentSuggest Web Portal. URL http://www.usask.ca/~mor543/contentsuggest [8] C.K. Roy and J.R. Cordy. NICAD: Accurate Detection of Near Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization. In Proc. ICPC, pages 172-181, 2008. 24

×