Your SlideShare is downloading. ×
Thesis Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Thesis Presentation

2,344
views

Published on

The presentation I gave to the WPI Computer Science department in defense of my Master's thesis.

The presentation I gave to the WPI Computer Science department in defense of my Master's thesis.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,344
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Evaluating User Feedback Systems Worcester Polytechnic Institute Kevin Menard April 13, 2005
  • 2. Problem  Lots of information split over many documents  Search engines are now a necessity  Search engines are “dumb”  Document relevance is a mathematical formula, not a user rating  Easy to fool  Hard to find good info if in a “non-conforming” format  Users know relevance values but can’t be bothered
  • 3. Solution  Use implicit user behavior in place of explicit feedback ratings  WPI Curious Browser  Discovered a set of implicit indicators that highly correlated with feedback values  Microsoft Curious Browser  Built upon WPI work and collected user feedback  Used to train classifier with explicit & implicit data to provide predictions of web page relevance
  • 4. Our Work  Investigate value of “voluntary” data  Previous work only used “mandatory” data  Mandorvol Browser  Extension of MS Curious Browser  Collects data using both voluntary & mandatory feedback mechanisms  Collects data in controlled & uncontrolled scenarios
  • 5. Mandorvol Browser  Uncontrolled scenario  User simply searches for anything on Google  Controlled scenario  User is given Excel tasks to complete  Most people have experience with it, but it’s complex enough that tasks can be chosen that will require help  Search is limited to Excel help assets  Search is performed via custom Java web application that provides a Google-like interface to Excel help assets
  • 6. Informal Hypotheses  H1: Quality of voluntary data will be higher  Users will only offer feedback if they want  Good for classifiers  H2: Quantity of mandatory data will be greater  Users must provide feedback for each page  Also good for classifiers  H3: Quantity of controlled data will be lower  Users completing tasks don’t want to be bothered
  • 7. Timeline 2004:  Development: Aug. – Nov.  Pilot Studies: Nov.  Dev, Testing, Deployment: Dec. - Feb. 2005:  Major Study: March - April  Rudimentary Analysis: April – May  Detailed Analysis: Sep. – Dec. 2006:  Conclusions & Thesis Write-up: Jan. - April
  • 8. Pilot Studies  Goals  Test voluntary feedback mechanism  Test tasks for controlled situations  Key observations  Feedback band location matters  Horizontal VS vertical  “Banner ad” effect  Vertical band with bright colors  Double evaluation  Task-oriented users don’t provide feedback once they solve their problem
  • 9. Study  Ran for two months in two phases  161 total users across four experiment types  Mandatory Controlled (28)  Mandatory Uncontrolled (45)  Voluntary Controlled (48)  Voluntary Uncontrolled (40) Controlled Uncontrolled Mandatory 17.39% 27.95% Voluntary 29.81% 24.84%
  • 10. Feedback  Feedback Ratio  Amount of feedback / # search results Controlled Uncontrolled Mandatory 0.946043165 0.977690289 Voluntary 0.745762712 0.918149466  Feedback Opportunities  Amount of feedback / # of opportunities to give feedback Controlled Uncontrolled Mandatory 0.626190476 0.573518091 Voluntary 0.408668731 0.606345476
  • 11. Feedback Distribution  Normalized  No feedback not considered Satisfied Partially Satisfied Dissatisfied Mandatory 29.66% 23.57% 46.77% Controlled Mandatory 46.85% 22.28% 30.87% Uncontrolled Voluntary 50.76% 16.67% 32.58% Controlled Voluntary 49.42% 21.71% 28.88% Uncontrolled
  • 12. Feedback Distribution (Cont.)  Not normalized  No feedback values included Satisfied Partially Dissatisfied No Feedback Satisfied Mandatory 28.06% 22.30% 44.24% 5.40% Controlled Mandatory 45.80% 21.78% 30.18% 2.23% Uncontrolled Voluntary 37.85% 12.43% 24.29% 25.42% Controlled Voluntary 45.37% 19.93% 26.51% 8.19% Uncontrolled
  • 13. High-level Analysis  A distinctive voluntary feedback mechanism yields high quantity feedback  Data could be skewed by nature of study  Users more apt to give feedback when searching leisurely in a known domain  E.g., I search for “drums” and I know what to expect in the search results list -- I can better evaluate them  Users more apt to give Satisfied feedback when searching leisurely
  • 14. In-depth Analysis  What?  Build decision trees to investigate data qualities  How?  Weka – Open-source machine learning tool  Why?  Similar to previous work – provides validation  Relates back to original problem of improving search results
  • 15. Decision Trees PagePosition ≤1 >1 LinkTextLength PagePosition >5 ≤5 >5 ≤5 Satisfied … … …  (PagePosition ≤ 1) ∧ (LinkTextLength ≤ 5) ⇒ Satisfied  (PagePosition ≤ 1) ∧ (LinkTextLength > 5) ∧ … ⇒ …  (1 < PagePosition ≤ 5) ∧ … ⇒ …  (PagePosition > 5) ∧ … ⇒ …
  • 16. Data Preparation  Data pulled from DB and turned into Weka file  14 Data attributes  Behavior type, behavior URL length, dwell time, page count in session, page or in search result list, page order in all search result lists, search result URL length, link text length, page description length, script length, file size, image count, exit type, feedback value  Allowed J48 to handle continuous data  Allowed J48 to handle missing values  Script length, file size, image count, & exit type only
  • 17. Classifier Type  Why J48?  Easy to read rules are important  Interested in causal relationships  Performs well  Graph of various classifiers: 80 70 Classification Accuracy 60 rules.ZeroR '' trees.J48 '-C 0.25 -B -M 2' trees.J48 '-R -N 3 -Q 1 -M 2' trees.J48 '-R -N 3 -Q 1 -B -M 2' 50 trees.J48 '-S -C 0.25 -M 2' trees.J48 '-S -C 0.25 -B -M 2' trees.J48 '-S -R -N 3 -Q 1 -B -M 2' trees.J48 '-U -M 2' 40 0 1 2 3 4 5 6 7 8 9 10 rules.OneR '-B 6' trees.J48 '-C 0.25 -M 2' Data Set
  • 18. Optimizing Trees  Tree Size VS Accuracy  Occam’s Razor  Fewer rules create more general trees  Classification accuracy  Too few rules may not accurately model the domain  Pragmatism  Larger trees take longer to build and use
  • 19. Tree Pruning Effects – Classification Accuracy 77 76 Classification Accuracy (%) 75 74 73 72 71 trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' 70 trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' 0 1 2 3 4 5 6 7 8 9 10 trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2' Data Set
  • 20. Tree Pruning Effects – Number of Rules 600 500 400 Number of Rules 300 200 100 trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' 0 trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' 0 1 2 3 4 5 6 7 8 9 10 trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2' Data Set
  • 21. Results Mandatory Controlled Mandatory Uncontrolled Instances: 362 (20 users) Instances: 2050 (37 users) # of Rules: 28 # of Rules: 168 Tree Size: 55 Tree Size: 329 Accuracy: 67.33% Accuracy: 67.32% Voluntary Controlled Voluntary Uncontrolled Instances: 398 (29 users) Instances: 1348 (31 users) # of Rules: 32 # of Rules: 114 Tree Size: 61 Tree Size: 221 Accuracy: 74.18% Accuracy: 70.10%
  • 22. Daily Classifier Results 100  Mandatory Uncontrolled -> % Correct 80 60 ± 1 Std. Dev. Avg. % Correct S.B. 0.05 CI* 40 100 0 5 10 15 20 25 30 35 Days % Correct 80 60  <- Voluntary Uncontrolled ± 1 Std. Dev. Avg. % Correct S.W. 0.05 CI** S.B. 0.05 CI* 40 0 5 10 15 20 25 30 35 Days
  • 23. Conclusions  Mandatory feedback mechanism collects more data (supports H2)  May not be important – voluntary feedback mechanism collects “enough”  Voluntary classifiers > Mandatory classifiers  Voluntary data is higher quality (supports H1)  Controlled classifiers > Uncontrolled classifiers  Controlled search results are better defined  Search domain affects feedback values & data quantity  Task-oriented VS leisurely browsing  Controlled collects less data than uncontrolled (supports H3, although only with voluntary feedback mechanism)
  • 24. Future Work  Investigate better voluntary feedback mechanisms  More diversified population  Try non-Web browser context
  • 25. Daily Experiment Growth 140 Total Number of Participants 120 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 Day
  • 26. Acknowledgements  Many thanks are extended to:  Prof. Brown  Prof. Claypool  Prof. Pollice  The NUI group at Microsoft  The CCC staff  Melanie Bolduc  Friends & family