Your SlideShare is downloading. ×
0
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

148

Published on

We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of …

We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of both content segmentation and ranking with respect to relevance; and mean average segment distance-weighted precision (MASDWP) which takes into account the distance between the start of the relevant segment and the retrieved segment. We demonstrate the effectiveness of these new metrics on a retrieval test collection based on the AMI meeting corpus.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
148
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. New Metrics for Meaningful Evaluation of Informally Structured Speech RetrievalNew Metrics for Meaningful Evaluation of Informally Structured Speech RetrievalMaria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2 1Centre for Digital Video Processing 2 Centre for Next Generation Localisation School of Computing Dublin City University, Dublin, Ireland 3 Qatar Computing Research Institute - Qatar Foundation Doha, Qatar April, 3, 2012
  • 2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity
  • 4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Collection
  • 9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Queries (text)
  • 10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Speech Recognition System Queries (text)
  • 11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text)
  • 12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Segments
  • 13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Indexed Segments Indexing Segments
  • 14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Information Segmentation Request Indexed Segments Indexing Segments
  • 15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Retrieval Results: Collection (audio) speech segments Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Retrieval Results: Collection speech segments Automatic Speech Recognition System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units:
  • 20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP)
  • 21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP)
  • 22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant:
  • 25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00
  • 26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00 Shortcomings: averaging over characters in transcript is not suitable for speech tasks
  • 27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1 Shortcomings: Does not take into account how much time the user needs to spend listening to access the relevant content
  • 32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Time Precision Oriented Metrics Motivation: Create a metric that measures both the ranking quality and the segmentation quality with respect to relevance in a single score. Reflect how far the user has to listen into the segment at a certain rank until the relevant part actually begins.
  • 34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP)
  • 35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision:
  • 38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  • 39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0 Difference from other metrics: the amount of relevant content is measured over time instead of text average segment precision (ASP) is calculated at the ranks of segments containing relevant content rather than fixed recall points as in MAiP
  • 40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Distance-Weighted Precision (MASDWP) Penalize ASP results as mGAP N 1 Distance ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1 n Granularity r =1
  • 41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Retrieved Segments 1 2 3 4 5 6
  • 42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ Total Len 2/3 0/5 3/4 6/6 0/2 5/10
  • 43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len MAP 0.771 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 MASP 0.557 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 MASDWP 0.260 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Test Collection Speech collection: AMI Corpus Ca. 100 hours of data (80 hours of speech) 160 meetings: average length – 30 minutes Transcript Manual Automatic Speech Recognition (ASR), WER ≈ 30 % Retrieval test set: 25 queries with text taken form PowerPoint slides provided with the AMI Corpus (avr len > 10 content words) Manual relevance assessment
  • 54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Segmentation Methods and Retrieval Runs Segmentation*: Lexical cohesion based algorithms: TextTiling, C99 Time- and length-based algorithms: time length = 60, 120, 150, 180 seconds; number of words per segment = 300, 400 Extreme case: No segmentation Retrieval system: SMART extended to use language modeling * Manual boundaries for both types of transcript
  • 55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009
  • 57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score
  • 58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience
  • 59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank
  • 60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank − > shorter average length of the segments makes it easier to capture the segment closer to the jump-in point
  • 61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181
  • 62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181 AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc
  • 63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 (–) 60/60 (–) 4 243/243 (–) 179/179 (–) 59/59 (1) 5 180/180 (-69) 60/60 (–) 6 105/125 (20) 59/59 (-10) 7 157/204 (47) 179/179 (0) 59/59 (–) 8 107/107 (-45) 59/179 60/60 (–) 9 350/429 (47) 162/180 (-4) 60/60 (21) 10 122/122 (-11) 143/181 (–) AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc ASDWP c99 > time 180 > time 60 > one doc
  • 64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man
  • 67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man AiP: man<asr man; ASP: man>asr man (relevant content moves down from higher ranks)
  • 68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents:
  • 70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters
  • 71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters Introduced MASP and MASDWP: MASP: captures the amount of relevant content that appears at different ranks MASDWP: rewards runs where segmentation algorithms put boundaries closer to the relevant content and these segments are higher in the ranked list
  • 72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Thank you for your attention!

×