Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance<br />A presentation by Me...
Diane Kelly<br />Associate Professor, School of Library and Information Science, UNC Chapel Hill<br /><ul><li>Education:
Ph.D., Rutgers University (Information Science)
MLS, Rutgers University (Information Retrieval)
BA, University of Alabama (Psychology and English)
Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science</li></ul>2<br />
Primary Aim of Research<br />“to investigate the  relationship between actual system performance and users’ evaluations of...
Secondary Aim of Research<br />“to develop an experimental method that can be used to isolate and study specific aspects o...
Previous Experimental Protocols<br />Traditional lab-based<br />Naturalistic<br />TREC Interactive Track<br />Study entire...
Literature Review<br />Main criticisms of previous studies:<br />Evaluation measures were calculated based on TREC assesso...
Methods<br />7<br />
Studies 1 and 2 : <br />effect of position of relevant documents on user’s evaluation of system performance<br />Study 3:<...
9<br />Participants were asked to help researchers evaluate four search engines<br />For each search engine, read topic an...
10<br />After issuing query, all participants were re-directed to the same results page with 10 standardized results<br />
11<br />Participants asked to evaluate full text of each search result in the order presented and judge the relevance<br />
12<br />After evaluating all the documents on the results page, participants were asked to evaluate the search engine<br />
Study 1<br />Operationalized average precision at n<br />Subjects required to evaluate all 10 documents<br />13<br />
Study 2<br />Also operationalized average precision at n<br />Subjects instructed to find five relevant documents<br />14<...
Study 3 – Operationalized Precision at n<br />15<br />
Topics and Documents<br />16<br />Selected topics associated with newspaper articles about current events<br />Selected do...
Study Participants<br />17<br />“Convenient sample” (pg 9:27) of undergraduates from UNC<br />27 participants for each stu...
Results<br />Relevance Assessments<br />18<br />
Did users’ relevance judgments agree with baseline assessments?<br />19<br />
Did users’ relevance judgments agree with baseline assessments?<br />20<br />
Did the topic affect differences in relevance assessments?<br />21<br />
How much did relevance assessments vary between documents?<br />22<br />
Results<br />Evaluations of <br />System Performance<br />23<br />
Did participants modify evaluation ratings?<br />24<br />
Participant ratings compared between performance levels and studies<br />25<br />
Participant ratings compared between performance levels and studies<br />26<br />Study 1 showed no significant differences...
Participant ratings compared between performance levels and studies<br />27<br />Studies 2 and 3 did show significant diff...
What are the differences between study 1 and study 2?<br />Intended difference: <br />Completion time?<br />28<br />
What are the differences between study 1 and study 2?<br />Unintended differences:<br />Instructions for study 2 provided ...
User Experienced Precision<br />30<br />“experimental manipulations [of precision] were only 90% effective” (pg 9:24)<br />
Are user-experienced precision values correlated with user ratings of system performance?<br />31<br />
Are user-experienced precision values correlated with user ratings of system performance?<br />32<br />
Upcoming SlideShare
Loading in...5
×

Eastwood presentation on_kellyetal2010

114

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
114
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • “My research is focused on information search behavior and the design and evaluation of systems that support interactive information retrieval.”UNC Chapel Hill : according to US News and World Report, they have the #2 library science graduate school in nation– very strong programXun Fu and Chirag Shah were P.h.D students in the program at the time this article was written
  • Eastwood presentation on_kellyetal2010

    1. 1. Effects of Position and Number of Relevant Documents on Users’ Evaluations of System Performance<br />A presentation by Meg Eastwood <br />on the 2010 paper by D. Kelly, X. Fu, and C. Shah<br />INF 384H<br />September 26th, 2011<br />1<br />
    2. 2. Diane Kelly<br />Associate Professor, School of Library and Information Science, UNC Chapel Hill<br /><ul><li>Education:
    3. 3. Ph.D., Rutgers University (Information Science)
    4. 4. MLS, Rutgers University (Information Retrieval)
    5. 5. BA, University of Alabama (Psychology and English)
    6. 6. Graduate Certificate in Cognitive Science, Rutgers Center for Cognitive Science</li></ul>2<br />
    7. 7. Primary Aim of Research<br />“to investigate the relationship between actual system performance and users’ evaluations of system performance” (pg 9:2)<br />3<br />
    8. 8. Secondary Aim of Research<br />“to develop an experimental method that can be used to isolate and study specific aspects of the search process” (pg 9:2)<br />4<br />
    9. 9. Previous Experimental Protocols<br />Traditional lab-based<br />Naturalistic<br />TREC Interactive Track<br />Study entire search episodes<br />Thomas and Hawking (2006)<br />Trade control for “ecological validity”<br />5<br />Both designs include so many variables that it can be “difficult to establish causal relationships” (pg 9:2)<br />
    10. 10. Literature Review<br />Main criticisms of previous studies:<br />Evaluation measures were calculated based on TREC assessor’s relevance judgments, not user judgments<br />Users not provided with explicit instructions<br />Users may have been fatigued<br />Low sample sizes<br />6<br />
    11. 11. Methods<br />7<br />
    12. 12. Studies 1 and 2 : <br />effect of position of relevant documents on user’s evaluation of system performance<br />Study 3:<br />effect of number of relevant documents<br />8<br />
    13. 13. 9<br />Participants were asked to help researchers evaluate four search engines<br />For each search engine, read topic and posed one query <br />
    14. 14. 10<br />After issuing query, all participants were re-directed to the same results page with 10 standardized results<br />
    15. 15. 11<br />Participants asked to evaluate full text of each search result in the order presented and judge the relevance<br />
    16. 16. 12<br />After evaluating all the documents on the results page, participants were asked to evaluate the search engine<br />
    17. 17. Study 1<br />Operationalized average precision at n<br />Subjects required to evaluate all 10 documents<br />13<br />
    18. 18. Study 2<br />Also operationalized average precision at n<br />Subjects instructed to find five relevant documents<br />14<br />
    19. 19. Study 3 – Operationalized Precision at n<br />15<br />
    20. 20. Topics and Documents<br />16<br />Selected topics associated with newspaper articles about current events<br />Selected documents with “high probability of being judged relevant or not relevant” (pg 9:12) <br />
    21. 21. Study Participants<br />17<br />“Convenient sample” (pg 9:27) of undergraduates from UNC<br />27 participants for each study (1 -3)<br />Demographic information collected:<br />Sex<br />Age<br />Major<br />Search experience<br />Search frequency<br />
    22. 22. Results<br />Relevance Assessments<br />18<br />
    23. 23. Did users’ relevance judgments agree with baseline assessments?<br />19<br />
    24. 24. Did users’ relevance judgments agree with baseline assessments?<br />20<br />
    25. 25. Did the topic affect differences in relevance assessments?<br />21<br />
    26. 26. How much did relevance assessments vary between documents?<br />22<br />
    27. 27. Results<br />Evaluations of <br />System Performance<br />23<br />
    28. 28. Did participants modify evaluation ratings?<br />24<br />
    29. 29. Participant ratings compared between performance levels and studies<br />25<br />
    30. 30. Participant ratings compared between performance levels and studies<br />26<br />Study 1 showed no significant differences in ratings according to performance level<br />
    31. 31. Participant ratings compared between performance levels and studies<br />27<br />Studies 2 and 3 did show significant differences in ratings according to performance level<br />
    32. 32. What are the differences between study 1 and study 2?<br />Intended difference: <br />Completion time?<br />28<br />
    33. 33. What are the differences between study 1 and study 2?<br />Unintended differences:<br />Instructions for study 2 provided clearer performance objective<br />Subjects felt more successful in study 2?<br />29<br />
    34. 34. User Experienced Precision<br />30<br />“experimental manipulations [of precision] were only 90% effective” (pg 9:24)<br />
    35. 35. Are user-experienced precision values correlated with user ratings of system performance?<br />31<br />
    36. 36. Are user-experienced precision values correlated with user ratings of system performance?<br />32<br />
    37. 37. Regression analysis: can you use experienced precision to predict user evaluation?<br />33<br />
    38. 38. Authors’ Discussion and Conclusions<br />“…variations in precision at 10 scores have the greatest impact on subjects’ evaluation ratings.” (pg 9:26)<br />Thoughtful analysis of experimental caveats and generalizability of results<br />Convenient sample of students<br />Only one genre of documents represented<br />Are these results specific to informational/exploratory tasks?<br />34<br />
    39. 39. Suggested Class Discussion Topics<br />Areas where the experiment may have been too tightly controlled/artificial:<br />Controlling order in which users could rate documents?<br />Areas where the experiment may not have been as controlled as the authors intended:<br />Allowing subjects to formulate own queries<br />Study 2 allowed participants to feel “successful”?<br />Ten-point evaluation scale versus five-point evaluation scale?<br />35<br />
    40. 40. References<br />Kelly, D., Fu, X., and Shah, C. 2010. Effects of position and number of relevant documents retrieved on users’ evaluations of system performance. ACM Trans. Inf. Syst. 28, 2, Article 9 (May 2010), 29 pages. DOI 10.1145/1740592.1740597. http://doi.acm.org/10.1145/1740592.1740597<br />36<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×