New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

355 views

Published on

We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of both content segmentation and ranking with respect to relevance; and mean average segment distance-weighted precision (MASDWP) which takes into account the distance between the start of the relevant segment and the retrieved segment. We demonstrate the effectiveness of these new metrics on a retrieval test collection based on the AMI meeting corpus.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
355
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

  1. 1. New Metrics for Meaningful Evaluation of Informally Structured Speech RetrievalNew Metrics for Meaningful Evaluation of Informally Structured Speech RetrievalMaria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2 1Centre for Digital Video Processing 2 Centre for Next Generation Localisation School of Computing Dublin City University, Dublin, Ireland 3 Qatar Computing Research Institute - Qatar Foundation Doha, Qatar April, 3, 2012
  2. 2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  3. 3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity
  4. 4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  5. 5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  6. 6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  7. 7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  8. 8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Collection
  9. 9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Queries (text)
  10. 10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Speech Recognition System Queries (text)
  11. 11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text)
  12. 12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Segments
  13. 13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Indexed Segments Indexing Segments
  14. 14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Information Segmentation Request Indexed Segments Indexing Segments
  15. 15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  16. 16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Retrieval Results: Collection (audio) speech segments Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  17. 17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Retrieval Results: Collection speech segments Automatic Speech Recognition System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  18. 18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  19. 19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units:
  20. 20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP)
  21. 21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP)
  22. 22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  23. 23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  24. 24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant:
  25. 25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00
  26. 26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00 Shortcomings: averaging over characters in transcript is not suitable for speech tasks
  27. 27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  28. 28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  29. 29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  30. 30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  31. 31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1 Shortcomings: Does not take into account how much time the user needs to spend listening to access the relevant content
  32. 32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  33. 33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Time Precision Oriented Metrics Motivation: Create a metric that measures both the ranking quality and the segmentation quality with respect to relevance in a single score. Reflect how far the user has to listen into the segment at a certain rank until the relevant part actually begins.
  34. 34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP)
  35. 35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  36. 36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  37. 37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision:
  38. 38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  39. 39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0 Difference from other metrics: the amount of relevant content is measured over time instead of text average segment precision (ASP) is calculated at the ranks of segments containing relevant content rather than fixed recall points as in MAiP
  40. 40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Distance-Weighted Precision (MASDWP) Penalize ASP results as mGAP N 1 Distance ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1 n Granularity r =1
  41. 41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Retrieved Segments 1 2 3 4 5 6
  42. 42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ Total Len 2/3 0/5 3/4 6/6 0/2 5/10
  43. 43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  44. 44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  45. 45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len MAP 0.771 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  46. 46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  47. 47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  48. 48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 MASP 0.557 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  49. 49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  50. 50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  51. 51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 MASDWP 0.260 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  52. 52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  53. 53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Test Collection Speech collection: AMI Corpus Ca. 100 hours of data (80 hours of speech) 160 meetings: average length – 30 minutes Transcript Manual Automatic Speech Recognition (ASR), WER ≈ 30 % Retrieval test set: 25 queries with text taken form PowerPoint slides provided with the AMI Corpus (avr len > 10 content words) Manual relevance assessment
  54. 54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Segmentation Methods and Retrieval Runs Segmentation*: Lexical cohesion based algorithms: TextTiling, C99 Time- and length-based algorithms: time length = 60, 120, 150, 180 seconds; number of words per segment = 300, 400 Extreme case: No segmentation Retrieval system: SMART extended to use language modeling * Manual boundaries for both types of transcript
  55. 55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  56. 56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009
  57. 57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score
  58. 58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience
  59. 59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank
  60. 60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank − > shorter average length of the segments makes it easier to capture the segment closer to the jump-in point
  61. 61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181
  62. 62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181 AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc
  63. 63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 (–) 60/60 (–) 4 243/243 (–) 179/179 (–) 59/59 (1) 5 180/180 (-69) 60/60 (–) 6 105/125 (20) 59/59 (-10) 7 157/204 (47) 179/179 (0) 59/59 (–) 8 107/107 (-45) 59/179 60/60 (–) 9 350/429 (47) 162/180 (-4) 60/60 (21) 10 122/122 (-11) 143/181 (–) AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc ASDWP c99 > time 180 > time 60 > one doc
  64. 64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  65. 65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  66. 66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man
  67. 67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man AiP: man<asr man; ASP: man>asr man (relevant content moves down from higher ranks)
  68. 68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  69. 69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents:
  70. 70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters
  71. 71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters Introduced MASP and MASDWP: MASP: captures the amount of relevant content that appears at different ranks MASDWP: rewards runs where segmentation algorithms put boundaries closer to the relevant content and these segments are higher in the ranked list
  72. 72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Thank you for your attention!

×