Your SlideShare is downloading. ×
Eastwood presentation on_kellyetal2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Eastwood presentation on_kellyetal2010

103

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
103
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written
  • Transcript

    • 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance
      A presentation by Meg Eastwood
      on the 2010 paper by D. Kelly, X. Fu, and C. Shah
      INF 384H
      September 26th, 2011
      1
    • 2. Diane Kelly
      Associate Professor, School of Library and Information Science, UNC Chapel Hill
      • Education:
      • 3. Ph.D., Rutgers University (Information Science)
      • 4. MLS, Rutgers University (Information Retrieval)
      • 5. BA, University of Alabama (Psychology and English)
      • 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science
      2
    • 7. Primary Aim of Research
      “to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2)
      3
    • 8. Secondary Aim of Research
      “to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2)
      4
    • 9. Previous Experimental Protocols
      Traditional lab-based
      Naturalistic
      TREC Interactive Track
      Study entire search episodes
      Thomas and Hawking (2006)
      Trade control for “ecological validity”
      5
      Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)
    • 10. Literature Review
      Main criticisms of previous studies:
      Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments
      Users not provided with explicit instructions
      Users may have been fatigued
      Low sample sizes
      6
    • 11. Methods
      7
    • 12. Studies 1 and 2 :
      effect of position of relevant documents on user’s evaluation of system performance
      Study 3:
      effect of number of relevant documents
      8
    • 13. 9
      Participants were asked to help researchers evaluate four search engines
      For each search engine, read topic and posed one query
    • 14. 10
      After issuing query, all participants were re-directed to the same results page with 10 standardized results
    • 15. 11
      Participants asked to evaluate full text of each search result in the order presented and judge the relevance
    • 16. 12
      After evaluating all the documents on the results page, participants were asked to evaluate the search engine
    • 17. Study 1
      Operationalized average precision at n
      Subjects required to evaluate all 10 documents
      13
    • 18. Study 2
      Also operationalized average precision at n
      Subjects instructed to find five relevant documents
      14
    • 19. Study 3 – Operationalized Precision at n
      15
    • 20. Topics and Documents
      16
      Selected topics associated with newspaper articles about current events
      Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12)
    • 21. Study Participants
      17
      “Convenient sample” (pg 9:27) of undergraduates from UNC
      27 participants for each study (1 -3)
      Demographic information collected:
      Sex
      Age
      Major
      Search experience
      Search frequency
    • 22. Results
      Relevance Assessments
      18
    • 23. Did users’ relevance judgments agree with baseline assessments?
      19
    • 24. Did users’ relevance judgments agree with baseline assessments?
      20
    • 25. Did the topic affect differences in relevance assessments?
      21
    • 26. How much did relevance assessments vary between documents?
      22
    • 27. Results
      Evaluations of
      System Performance
      23
    • 28. Did participants modify evaluation ratings?
      24
    • 29. Participant ratings compared between performance levels and studies
      25
    • 30. Participant ratings compared between performance levels and studies
      26
      Study 1 showed no significant differences in ratings according to performance level
    • 31. Participant ratings compared between performance levels and studies
      27
      Studies 2 and 3 did show significant differences in ratings according to performance level
    • 32. What are the differences between study 1 and study 2?
      Intended difference:
      Completion time?
      28
    • 33. What are the differences between study 1 and study 2?
      Unintended differences:
      Instructions for study 2 provided clearer performance objective
      Subjects felt more successful in study 2?
      29
    • 34. User Experienced Precision
      30
      “experimental manipulations [of precision] were only 90% effective” (pg 9:24)
    • 35. Are user-experienced precision values correlated with user ratings of system performance?
      31
    • 36. Are user-experienced precision values correlated with user ratings of system performance?
      32
    • 37. Regression analysis: can you use experienced precision to predict user evaluation?
      33
    • 38. Authors’ Discussion and Conclusions
      “…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26)
      Thoughtful analysis of experimental caveats and generalizability of results
      Convenient sample of students
      Only one genre of documents represented
      Are these results specific to informational/exploratory tasks?
      34
    • 39. Suggested Class Discussion Topics
      Areas where the experiment may have been too tightly controlled/artificial:
      Controlling order in which users could rate documents?
      Areas where the experiment may not have been as controlled as the authors intended:
      Allowing subjects to formulate own queries
      Study 2 allowed participants to feel “successful”?
      Ten-point evaluation scale versus five-point evaluation scale?
      35
    • 40. References
      Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597
      36

    ×