An IDE-Based Context-Aware Meta Search Engine

20th Working Conference on Reverse Engineering
(WCRE 2013), Koblenz, Germany

AN IDE-BASED CONTEXT-AWARE
META SEARCH ENGINE
Mohammad Masudur Rahman, Shamima Yeasmin, and
Chanchal K. Roy
Department of Computer Science
University of Saskatchewan

SOFTWARE MAINTENANCE, BUGS &
EXCEPTIONS

EXCEPTION HANDLING: IDE SUPPORT

2

1

EXCEPTION HANDLING: DEVELOPERS
(NOVICE & EXPERT)

EXCEPTION HANDLING: WEB SEARCH

IDE-BASED WEB SEARCH
About 80% effort on Software Maintenance
 Bug fixation– error and exception handling
 Developers spend about 19% of time in web search
 Traditional web search




Does not consider context of search (No ties between
IDE and web browser)
 Context-switching and distracting
 Time consuming
 Often not much productive

o IDE-Based context-aware search addresses
those issues.

EXISTING RELATED WORKS
Cordeiro et al. (RSSE’ 2012)– Context-based
recommendation system
 Ponzanelli et al. (ICSE 2013)– Seahawk
 Poshyvanyk et al. (IWICSS 2007)– COTS (Google
Desktop) into Eclipse IDE
 Brandt et al. (SIGCHI 2010)– Integrating Google
web search into IDE


MOTIVATION EXPERIMENTS
83 Exceptions
 Solutions found for at most 58 exceptions.


Search
Query

Common
Results

Google Only

Yahoo Only

Bing
Only

Content Only

32

09

16

18

Content and
Context

47

09

11

10

THE KEY IDEA !! META SEARCH ENGINE

PROPOSED IDE-BASED META SEARCH MODEL

PROPOSED IDE-BASED META SEARCH
MODEL


Distinguished Features
Meta search engine– captures data from multiple
search engines
 More precise context– both stack trace and associated
code as exception context
 Popularity and confidence of result links
 Complete web browsing experience within the IDE


PROPOSED METRICS & SCORES
Title to title Matching Score (Stitle)– Cosine similarity
measurement
 Stack trace Matching Score (Sst)– SimHash based
similarity measurement
 Code context Matching Score (Scc)– SimHash
based similarity measurement
 StackOverflow Vote Score (Sso)– Summation of
differences between up and down votes for all
posts in the link


PROPOSED METRICS & SCORES
Top Ten Score (Stt)– Position of result link in the top
10 of each provider.
 Page Rank Score (Spr)-- Relative popularity among
all links in the corpus using Page Rank algorithm.
 Site Traffic Rank Score (Sstr)-- Alexa and Compete
Rank of each link
 Search Engine weight (Ssew)---Relative reliability or
importance of each search engine. Experiments
with 75 programming queries against the search
engines.


METRICS NORMALIZATION

S i , normalized

Si

min( S i )

max( S i )

min( S i )

Normalization applied to -- Sst , Scc , Sso , Stt , Spr
and Sstr
 Avoiding bias to any particular aspect


FINAL SCORE COMPONENTS
Content Relevance
Scnt=Stitle
 Context Relevance
Scxt=(Sst + Scc)/2
 Link Popularity
Spop=(Sso +Spr + Sstr)/3
 Search Engine Confidence
Sser=(Ssew x Stt)


EXPERIMENT OVERVIEW
25 Exceptions collected from Eclipse IDE
workspaces.
 Related to Eclipse plug-in framework and Java
Application Development
 Solutions chosen from exhaustive web search with
cross validations by peers
 Recommended results manually validated.


EXPERIMENTAL RESULTS
Score

Top 10

Rank10

Top 20

Rank20

Scnt

10

3.60

16

8.63

Scnt, Scxt

11

3.00

16

7.43

Scnt, Spop

13

4.69

18

8.11

Scnt, Sser

23

4.39

23

4.39

Scnt, Scxt, Spop

13

4.07

18

7.61

Scnt, Scxt, Sser

24

4.45

24

4.45

Scnt, Scxt, Sser, Spop

23

4.26

24

4.54

Top10: No. of test cases solved when the top 10 results
considered
Rank10: Average rank of solutions when the top 10 results considered

USER STUDY
Five interesting exception test cases.
 Five CS graduates research students as
participants.
 Top 10 results from SurfClipse randomly presented
to the participants.
 To avoid the bias of choosing top rated solutions.
 64.28% agreement found.


USER STUDY RESULTS
Question ID

ANSR

ANSM

Agreement

Q1

2.8

2.0

71.43%

Q2

4.6

2.8

60.87%

Q3

4.6

2.4

52.17%

Q4

4.2

3.0

71.43%

Q5

5.8

3.8

65.52%

Overall

4.4

2.8

64.28%

ANSR: Avg. no. of solutions recommended by the participants.
ANSM: Avg. no. of solution matched with that by our approach.
Agreement: % of agreement between solutions.

THREATS TO VALIDITY
Search is not real time yet.
 Different aspects need different weights.


LATEST UPDATES
A Distributed model for IDE-Based web search–
client-server architecture, remotely hosted web
service
 Parallel processing in computation
 Two modes of operations– proactive and interactive
 Granular refinement of metrics and assigning
relative weights (i.e., importance)
 Complete IDE-based web search solution.


CONCLUSION & FUTURE WORKS
A novel IDE-Based search with meta search
capabilities
 Exploits existing search service providers
 Considers content, context, popularity and
search engine confidence of a result.
 Recommends correct solution for 24(96%) out of 25
test cases.
 64.28% agreement in user study.
 Needs more extended experiments and user study.
 Metrics need to be fine-tuned and more granulated.


An IDE-Based Context-Aware Meta Search Engine

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to An IDE-Based Context-Aware Meta Search Engine

Similar to An IDE-Based Context-Aware Meta Search Engine (20)

More from Masud Rahman

More from Masud Rahman (20)

Recently uploaded

Recently uploaded (20)

An IDE-Based Context-Aware Meta Search Engine

Editor's Notes