SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)

AN IDE-BASED CONTEXT-AWARE
META SEARCH ENGINE
Mohammad Masudur Rahman, Shamima Yeasmin, and
Chanchal K. Roy
Department of Computer Science
University of Saskatchewan
20th Working Conference on Reverse Engineering
(WCRE 2013), Koblenz, Germany

SOFTWARE MAINTENANCE, BUGS &
EXCEPTIONS

EXCEPTION HANDLING: IDE SUPPORT
1
2

EXCEPTION HANDLING: DEVELOPERS
(NOVICE & EXPERT)

EXCEPTION HANDLING: WEB SEARCH

IDE-BASED WEB SEARCH
 About 80% effort on Software Maintenance
 Bug fixation– error and exception handling
 Developers spend about 19% of time in web search
 Traditional web search
 Does not consider context of search (No ties between
IDE and web browser)
 Context-switching and distracting
 Time consuming
 Often not much productive
o IDE-Based context-aware search addresses
those issues.

EXISTING RELATED WORKS
 Cordeiro et al. (RSSE’ 2012)– Context-based
recommendation system
 Ponzanelli et al. (ICSE 2013)– Seahawk
 Poshyvanyk et al. (IWICSS 2007)– COTS (Google
Desktop) into Eclipse IDE
 Brandt et al. (SIGCHI 2010)– Integrating Google
web search into IDE

MOTIVATION EXPERIMENTS
Search
Query
Common
Results
Google Only Yahoo Only Bing
Only
Content Only 32 09 16 18
Content and
Context
47 09 11 10
 83 Exceptions
 Solutions found for at most 58 exceptions.

THE KEY IDEA !! META SEARCH ENGINE

PROPOSED IDE-BASED META SEARCH MODEL

PROPOSED IDE-BASED META SEARCH
MODEL
 Distinguished Features
 Meta search engine– captures data from multiple
search engines
 More precise context– both stack trace and associated
code as exception context
 Popularity and confidence of result links
 Complete web browsing experience within the IDE

PROPOSED METRICS & SCORES
 Title to title Matching Score (Stitle)– Cosine similarity
measurement
 Stack trace Matching Score (Sst)– SimHash based
similarity measurement
 Code context Matching Score (Scc)– SimHash
based similarity measurement
 StackOverflow Vote Score (Sso)– Summation of
differences between up and down votes for all
posts in the link

PROPOSED METRICS & SCORES
 Top Ten Score (Stt)– Position of result link in the top
10 of each provider.
 Page Rank Score (Spr)-- Relative popularity among
all links in the corpus using Page Rank algorithm.
 Site Traffic Rank Score (Sstr)-- Alexa and Compete
Rank of each link
 Search Engine weight (Ssew)---Relative reliability or
importance of each search engine. Experiments
with 75 programming queries against the search
engines.

METRICS NORMALIZATION
 Normalization applied to -- Sst , Scc , Sso , Stt , Spr
and Sstr
 Avoiding bias to any particular aspect
)min()max(
)min(
,
ii
ii
normalizedi
SS
SS
S




FINAL SCORE COMPONENTS
 Content Relevance
Scnt=Stitle
 Context Relevance
Scxt=(Sst + Scc)/2
 Link Popularity
Spop=(Sso +Spr + Sstr)/3
 Search Engine Confidence
Sser=(Ssew x Stt)

EXPERIMENT OVERVIEW
 25 Exceptions collected from Eclipse IDE
workspaces.
 Related to Eclipse plug-in framework and Java
Application Development
 Solutions chosen from exhaustive web search with
cross validations by peers
 Recommended results manually validated.

EXPERIMENTAL RESULTS
Score Top 10 Rank10 Top 20 Rank20
Scnt 10 3.60 16 8.63
Scnt, Scxt 11 3.00 16 7.43
Scnt, Spop 13 4.69 18 8.11
Scnt, Sser 23 4.39 23 4.39
Scnt, Scxt, Spop 13 4.07 18 7.61
Scnt, Scxt, Sser 24 4.45 24 4.45
Scnt, Scxt, Sser, Spop 23 4.26 24 4.54
Top10: No. of test cases solved when the top 10 results
considered
Rank10: Average rank of solutions when the top 10 results considered

USER STUDY
 Five interesting exception test cases.
 Five CS graduates research students as
participants.
 Top 10 results from SurfClipse randomly presented
to the participants.
 To avoid the bias of choosing top rated solutions.
 64.28% agreement found.

USER STUDY RESULTS
Question ID ANSR ANSM Agreement
Q1 2.8 2.0 71.43%
Q2 4.6 2.8 60.87%
Q3 4.6 2.4 52.17%
Q4 4.2 3.0 71.43%
Q5 5.8 3.8 65.52%
Overall 4.4 2.8 64.28%
ANSR: Avg. no. of solutions recommended by the participants.
ANSM: Avg. no. of solution matched with that by our approach.
Agreement: % of agreement between solutions.

THREATS TO VALIDITY
 Search is not real time yet.
 Different aspects need different weights.

LATEST UPDATES
 A Distributed model for IDE-Based web search–
client-server architecture, remotely hosted web
service
 Parallel processing in computation
 Two modes of operations– proactive and interactive
 Granular refinement of metrics and assigning
relative weights (i.e., importance)
 Complete IDE-based web search solution.

CONCLUSION & FUTURE WORKS
 A novel IDE-Based search with meta search
capabilities
 Exploits existing search service providers
 Considers content, context, popularity and
search engine confidence of a result.
 Recommends correct solution for 24(96%) out of 25
test cases.
 64.28% agreement in user study.
 Needs more extended experiments and user study.
 Metrics need to be fine-tuned and more granulated.

SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)

Similar to SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track) (20)

More from Masud Rahman

More from Masud Rahman (20)

Recently uploaded

Recently uploaded (20)

SurfClipse-- An IDE based context-aware Meta Search Engine (ERA Track)

Editor's Notes