AIRG Presentation
Upcoming SlideShare
Loading in...5
×
 

AIRG Presentation

on

  • 804 views

A presentation I gave to the WPI Artificial Intelligence Research Group in preparation for my Master's thesis defense.

A presentation I gave to the WPI Artificial Intelligence Research Group in preparation for my Master's thesis defense.

Statistics

Views

Total Views
804
Views on SlideShare
804
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AIRG Presentation AIRG Presentation Presentation Transcript

  • Mandorvol Browser Worcester Polytechnic Institute Kevin Menard December 8, 2005
  • Problem  Lots of information split over many documents  Search engines are now a necessity  Search engines are “dumb”  Document relevance is a mathematical formula, not a user rating  Easy to fool  Hard to find good info if in a “non-conforming” format  Users know relevance values but can’t be bothered
  • Solution  Use implicit user behavior in place of explicit feedback ratings  WPI Curious Browser  Discovered a set of implicit indicators that highly correlated with feedback values  Microsoft Curious Browser  Built upon WPI work and collected user feedback  Used to train classifier with explicit & implicit data to provide predictions of web page relevance
  • Our Work  Investigate value of “voluntary” data  Previous work only used “mandatory” data  Mandorvol Browser  Extension of MS Curious Browser  Collects data using both voluntary & mandatory feedback mechanisms  Collects data in controlled & uncontrolled scenarios
  • Mandorvol Browser  Uncontrolled scenario  User simply searches for anything on Google  Controlled scenario  User is given Excel tasks to complete  Most people have experience with it, but it’s complex enough that tasks can be chosen that will require help  Search is limited to Excel help assets  Search is performed via custom Java web application that provides a Google-like interface to Excel help assets
  • Informal Hypotheses  H1: Quality of voluntary data will be higher  Users will only offer feedback if they want  Good for classifiers  H2: Quantity of mandatory data will be greater  Users must provide feedback for each page  Also good for classifiers  H3: Quantity of controlled data will be lower  Users completing tasks don’t want to be bothered
  • Timeline 2004:  Development: Aug. – Nov.  Pilot Studies: Nov.  Dev, Testing, Deployment: Dec. - Feb. 2005:  Major Study: March - April  Rudimentary Analysis: April – May  Detailed Analysis: Sep. – Dec. 2006:  Conclusions & Thesis Write-up: Jan. - March
  • Pilot Studies  Goals  Test voluntary feedback mechanism  Test tasks for controlled situations  Key observations  Feedback band location matters  Horizontal VS vertical  “Banner ad” effect  Vertical band with bright colors  Double evaluation  Task-oriented users don’t provide feedback once they solve their problem
  • Study  Ran for two months in two phases  161 total users across four experiment types  Mandatory Controlled (28)  Mandatory Uncontrolled (45)  Voluntary Controlled (48)  Voluntary Uncontrolled (40) Controlled Uncontrolled Mandatory 17.39% 27.95% Voluntary 29.81% 24.84%
  • Feedback  Feedback Ratio  Amount of feedback / # search results Controlled Uncontrolled Mandatory 0.946043165 0.977690289 Voluntary 0.745762712 0.918149466  Feedback Opportunities  Amount of feedback / # of opportunities to give feedback Controlled Uncontrolled Mandatory 0.626190476 0.573518091 Voluntary 0.408668731 0.606345476
  • Feedback Distribution  Normalized  No feedback not considered Satisfied Partially Satisfied Dissatisfied Mandatory 29.66% 23.57% 46.77% Controlled Mandatory 46.85% 22.28% 30.87% Uncontrolled Voluntary 50.76% 16.67% 32.58% Controlled Voluntary 49.42% 21.71% 28.88% Uncontrolled
  • Feedback Distribution (Cont.)  Not normalized  No feedback values included Satisfied Partially Dissatisfied No Feedback Satisfied Mandatory 28.06% 22.30% 44.24% 5.40% Controlled Mandatory 45.80% 21.78% 30.18% 2.23% Uncontrolled Voluntary 37.85% 12.43% 24.29% 25.42% Controlled Voluntary 45.37% 19.93% 26.51% 8.19% Uncontrolled
  • High-level Analysis  A distinguished voluntary feedback mechanism yields high quantity feedback  Data could be skewed by nature of study  Users more apt to give feedback when searching leisurely in a known domain  E.g., I search for “drums” and I know what to expect in the search results list -- I can better evaluate them  Users more apt to give Satisfied feedback when searching leisurely
  • In-depth Analysis  What?  Build classifiers to investigate data qualities  How?  Weka – Open-source machine learning tool  Why?  Similar to previous work – provides validation  Relates back to original problem of improving search results
  • Data Preparation  Data pulled from DB and turned into Weka file  15 Data attributes  Experiment type, behavior type, behavior URL length, dwell time, page count in session, page or in search result list, page order in all search result lists, search result URL length, link text length, page description length, script length, file size, image count, exit type, feedback value  Allowed J48 to handle continuous data  Allowed J48 to handle missing values  Script length, file size, image count, & exit type only
  • Classifier Type  Why J48?  Easy to read rules are important  Interested in causal relationships  Performs well  Graph of various classifiers: 80 rules.ZeroR '' trees.J48 '-C 0.25 -B -M 2' 70 trees.J48 '-R -N 3 -Q 1 -M 2' Classification Accuracy trees.J48 '-R -N 3 -Q 1 -B -M 2' trees.J48 '-S -C 0.25 -M 2' trees.J48 '-S -C 0.25 -B -M 2' 60 trees.J48 '-S -R -N 3 -Q 1 -B -M 2' trees.J48 '-U -M 2' rules.OneR '-B 6' 50 trees.J48 '-C 0.25 -M 2' 40 0 1 2 3 4 5 6 7 8 9 10 Data Set
  • Optimizing Trees  Tree Size VS Accuracy  Occam’s Razor  Fewer rules create more general trees  Classification accuracy  Too few rules may not accurately model the domain  Pragmatism  Larger trees take longer to build and use
  • Tree Pruning Effects 77 600 76 500 Classification Accuracy (%) 75 400 Number of Rules 74 300 73 200 72 100 71 70 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Data Set Data Set 1000 800 Tree Size 600 trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' 400 trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' 200 trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2' 0 0 1 2 3 4 5 6 7 8 9 10 Data Set
  • Results Mandatory Controlled Mandatory Uncontrolled Instances: 362 (20 users) Instances: 2050 (37 users) # of Rules: 28 # of Rules: 168 Tree Size: 55 Tree Size: 329 Accuracy: 67.33% Accuracy: 67.32% Voluntary Controlled Voluntary Uncontrolled Instances: 398 (29 users) Instances: 1348 (31 users) # of Rules: 32 # of Rules: 114 Tree Size: 61 Tree Size: 221 Accuracy: 74.18% Accuracy: 70.10%
  • Mandatory VS Voluntary  Mandatory:  Instances: 2412 (57 users)  # of Rules: 200 (MC + MU = 196)  Tree Size: 388 (MC + MU = 384)  Classification Accuracy: 67.27%  Voluntary:  Instances: 1746 (60 users)  # of Rules: 144 (VC + VU = 146)  Tree Size: 273 (VC + VU = 282)  Classification Accuracy: 70.54%
  • Controlled VS Uncontrolled  Controlled:  Instances: 760 (49 users)  # of Rules: 57 (MC + VC = 60)  Tree Size: 105 (MC + VC = 116)  Classification Accuracy: 68.55%  Uncontrolled:  Instances: 3398 (68 users)  # of Rules: 300 (MU + VU = 282)  Tree Size: 555 (MU + VU = 550)  Classification Accuracy: 67.27%
  • Rough Conclusions  Accuracy in a given domain is limited by lowest accuracy in the pair of datasets  E.g., VC = 74.18%, VU = 70.10%, V = 70.54%  Domain trees seem to be union of trees for both pairs of datasets  Voluntary classifiers > Mandatory classifiers  Voluntary data is higher quality (supports H1)  Controlled classifiers > Uncontrolled classifiers  Controlled search results are better defined
  • Future Work  My Study:  Finish analysis with Weka  Investigate rules more thoroughly  Account for observed classification accuracies  Develop solid conclusions  Other Studies:  Investigate better voluntary feedback mechanisms  More diversified population  Try non-Web browser context
  • Conclusions  Choice of feedback mechanism affects data quantity  Probably affects data quality  Search domain affects feedback values & data quantity  Task-oriented VS leisurely browsing  Questions?
  • Acknowledgements  Many thanks are extended to:  Prof. Brown  Prof. Claypool  The NUI group at Microsoft