SlideShare a Scribd company logo
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval




New Metrics for Meaningful Evaluation of
 Informally Structured Speech Retrieval

Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2

                    1Centre for Digital Video Processing
                  2 Centre for Next Generation Localisation
                             School of Computing
                    Dublin City University, Dublin, Ireland
       3   Qatar Computing Research Institute - Qatar Foundation
                             Doha, Qatar


                                April, 3, 2012
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:




                   Meetings:
Speech Retrieval       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Documents Diversity


                   Broadcast news:




                   Meetings:
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech
                   Collection
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)




                                                               Queries
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

                                                Automatic
                                                Speech
                                                Recognition
                                                System

                                                               Queries
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
          Segmentation

                   Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                               Queries
                   Transcript
                                                                (text)
          Segmentation
                                                   Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries
                   Transcript
                                                       (text)
                                             Information
          Segmentation
                                               Request
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries
                   Collection                                  (audio)

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                      Queries                  Retrieval Results:
                   Collection                                  (audio)                  speech segments

             Automatic                          Automatic
             Speech                             Speech
             Recognition                        Recognition
             System                             System

                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Retrieval   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Speech Retrieval

                   Speech                                                               Retrieval Results:
                   Collection                                                           speech segments

             Automatic
             Speech
             Recognition
             System
                                                     Queries                            Retrieval Results:
                   Transcript
                                                       (text)                           textual segments
                                             Information
          Segmentation
                                               Request
                                                                                                 Retrieval
                                                    Indexed
                   Segments
                                          Indexing Segments
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)

                 Jump-in points:
                         CLEF CL-SR: Mean Generalized Average Precision
                         (mGAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Related Work in Speech Search Evaluation


         Retrieval Units:
                 Clearly defined documents:
                 TREC SDR: Mean Average Precision (MAP)
                 Passages:
                         INEX : Mean Average interpolated Precision (MAiP)

                 Jump-in points:
                         CLEF CL-SR: Mean Generalized Average Precision
                         (mGAP)
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:




         Average interpolated Precision (AiP): average of interpolated
         precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
         1.00):
                                 1
                        AiP =       .                 iP[x]
                                101
                                                          x=0.00,0.01,...,1.00
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average interpolated Precision (MAiP)
         Task: passage text retrieval.
         Document relevance is not counted in a binary way.
         Precision at rank r : fraction of retrieved number of characters
         that are relevant:




         Average interpolated Precision (AiP): average of interpolated
         precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
         1.00):
                                 1
                        AiP =       .                 iP[x]
                                101
                                                          x=0.00,0.01,...,1.00

         Shortcomings: averaging over characters in transcript is
         not suitable for speech tasks
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1
Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  mean Generalized Average Precision (mGAP)

         Task: retrieval of the jump-in points in time for relevant content




                                                N
                                         1                              Distance
                           GAP =           .          P[r ] · 1 −                  · 0.1
                                         n                             Granularity
                                               r =1

         Shortcomings: Does not take into account
         how much time the user needs to spend listening
         to access the relevant content
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Time Precision Oriented Metrics



         Motivation:

              Create a metric that measures both the ranking quality and
              the segmentation quality with respect to relevance in a
              single score.

              Reflect how far the user has to listen into the segment at a
              certain rank until the relevant part actually begins.
New Metrics   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
                                                     N
                                       1
                                  ASP = .                  SP[r ] · rel(sr )
                                       n
                                                    r =1

         rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
New Metrics        New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Precision (MASP)
         Segment Precision (SP[r ]) at rank r :




         Average Segment Precision:
                                                     N
                                       1
                                  ASP = .                  SP[r ] · rel(sr )
                                       n
                                                    r =1

         rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
         Difference from other metrics:
              the amount of relevant content is measured over time
              instead of text
              average segment precision (ASP) is calculated at the
              ranks of segments containing relevant content
              rather than fixed recall points as in MAiP
New Metrics      New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Mean Average Segment Distance-Weighted Precision
  (MASDWP)

         Penalize ASP results as mGAP




                                 N
                          1                                                Distance
              ASDWP =       .          SP[r ] · rel(sr ) · 1 −                        · 0.1
                          n                                               Granularity
                                r =1
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Retrieved
              Segments
                 1

                 2

                 3

                 4

                 5

                 6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
              Total Len
                2/3

                 0/5

                  3/4

                 6/6

                 0/2

                5/10
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1

                 0/5                1/2

                  3/4               2/3

                 6/6                3/4

                 0/2                3/5

                5/10                4/6
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8

                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8

                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3

                 0/5                1/2                  2/8
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12

                 6/6                3/4                11/18

                 0/2                3/5                11/20

                5/10                4/6                16/30
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0

                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                             MAP
                                                                                                    0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                    MASP
                                                                                                    0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0

                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
New Metrics       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Comparative example of AP, ASP and ASDWP

              Rel Len/
                                    AP                 ASP                    ASDWP
              Total Len                                                                              MAP
                                                                                                     0.771
                2/3                  1                   2/3                  2/3 * 1.0

                 0/5                1/2                  2/8                  2/8 * 0.0
                                                                                                     MASP
                                                                                                     0.557
                  3/4               2/3                 5/12                 5/12 * 0.9

                 6/6                3/4                11/18                11/18 * 0.0
                                                                                                    MASDWP
                                                                                                     0.260
                 0/2                3/5                11/20                11/20 * 0.0

                5/10                4/6                16/30                16/30 * 0.0
Retrieval Collection   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
          Speech Retrieval

          Speech Search Evaluation
            Mean Average Precision (MAP)
            Mean Average interpolated Precision (MAiP)
            mean Generalized Average Precision (mGAP)

          New Metrics
            Mean Average Segment Precision (MASP)
            Mean Average Segment Distance-Weighted Precision
            (MASDWP)

          Retrieval Collection

          Experimental Results

          Conclusions
Retrieval Collection    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Test Collection


          Speech collection: AMI Corpus
                   Ca. 100 hours of data (80 hours of speech)
                   160 meetings:
                       average length – 30 minutes
                   Transcript
                       Manual
                       Automatic Speech Recognition (ASR), WER ≈ 30 %
          Retrieval test set:
                   25 queries with text taken form PowerPoint slides provided
                   with the AMI Corpus (avr len > 10 content words)
                   Manual relevance assessment
Retrieval Collection   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Segmentation Methods and Retrieval Runs



                   Segmentation*:
                       Lexical cohesion based algorithms: TextTiling, C99
                       Time- and length-based algorithms:
                       time length = 60, 120, 150, 180 seconds;
                       number of words per segment = 300, 400
                       Extreme case: No segmentation
                   Retrieval system:
                       SMART extended to use language modeling

          * Manual boundaries for both types of transcript
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
                 time 60: the highest MASDWP rank
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Scores Results for 1000 retrieved documents
                           Run                                asr man
                                             MAP           MAiP MASP                    MASDWP
                          c99               0.438          0.275 0.218                   0.177
                           tt               0.421          0.275 0.221                   0.173
                        len 300             0.416          0.287 0.248                   0.181
                        len 400             0.463          0.286 0.237                   0.147
                       time 120             0.428          0.296 0.256                   0.196
                       time 150             0.448          0.283 0.243                   0.171
                       time 180             0.473          0.300 0.246                   0.163
                        time 60             0.333          0.259 0.238                   0.220
                        one doc             0.686          0.109 0.085                   0.009

                 one doc run: only MAP highest score, all other metrics
                 has the lowest score − > contradict user experience
                 time 60: the highest MASDWP rank − > shorter average
                 length of the segments makes it easier to capture
                 the segment closer to the jump-in point
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank      c99                       time 180                  time 60
                       3                                   179/179                   60/60
                       4         243/243                   179/179                   59/59
                       5                                   180/180                   60/60
                       6         105/125                                             59/59
                       7         157/204                   179/179                   59/59
                       8         107/107                   59/179                    60/60
                       9         350/429                   162/180                   60/60
                       10        122/122                   143/181
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank c99           time 180     time 60
                       3                  179/179      60/60
                       4     243/243      179/179      59/59
                       5                  180/180      60/60
                       6     105/125                   59/59
                       7     157/204      179/179      59/59
                       8     107/107      59/179       60/60
                       9     350/429      162/180      60/60
                       10    122/122      143/181
                         AP:   one doc > time 180 > c99 > time 60
                         AiP: c99 > time 180 > time 60 > one doc
                         ASP time 180 > c99 > time 60 > one doc
Experimental Results    New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Capturing Difference Between Segmentations

                       Rank c99            time 180      time 60
                       3                   179/179 (–) 60/60 (–)
                       4     243/243 (–) 179/179 (–) 59/59 (1)
                       5                   180/180 (-69) 60/60 (–)
                       6     105/125 (20)                59/59 (-10)
                       7     157/204 (47) 179/179 (0) 59/59 (–)
                       8     107/107 (-45) 59/179        60/60 (–)
                       9     350/429 (47) 162/180 (-4) 60/60 (21)
                       10    122/122 (-11) 143/181 (–)
                         AP:   one doc > time 180 > c99 > time 60
                         AiP: c99 > time 180 > time 60 > one doc
                         ASP time 180 > c99 > time 60 > one doc
                       ASDWP c99 > time 180 > time 60 > one doc
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques




                                            AiP: man<asr man; ASP: man>asr man
Experimental Results   New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Impact of Averaging Techniques




                                            AiP: man<asr man; ASP: man>asr man



                              AiP: man<asr man; ASP: man>asr man
         (relevant content moves down from higher ranks)
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Outline
         Speech Retrieval

         Speech Search Evaluation
           Mean Average Precision (MAP)
           Mean Average interpolated Precision (MAiP)
           mean Generalized Average Precision (mGAP)

         New Metrics
           Mean Average Segment Precision (MASP)
           Mean Average Segment Distance-Weighted Precision
           (MASDWP)

         Retrieval Collection

         Experimental Results

         Conclusions
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
              MAP is appropriate for clearly defined documents
              MAiP works with transcript characters
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval


  Conclusions

         MAP and MAiP do not reflect the user experience of informally
         structured speech documents:
              MAP is appropriate for clearly defined documents
              MAiP works with transcript characters
         Introduced MASP and MASDWP:

              MASP: captures the amount of relevant content that
              appears at different ranks

              MASDWP: rewards runs where segmentation algorithms
              put boundaries closer to the relevant content and these
              segments are higher in the ranked list
Conclusions       New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval




         Thank you for your attention!

More Related Content

Viewers also liked

Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Maria Eskevich
 
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskDCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
Maria Eskevich
 
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Maria Eskevich
 
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Maria Eskevich
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
Maria Eskevich
 
Anderson Oehler 360 Presentation
Anderson Oehler 360 PresentationAnderson Oehler 360 Presentation
Anderson Oehler 360 Presentation
ucwisdom
 
Leadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden TrainingLeadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden Training
Best of the Best Business Awards
 
Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback
Dubai, UAE
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?
Maria Eskevich
 
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Hitesh Gossain
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
Photostory slideshow
Photostory slideshow Photostory slideshow
Photostory slideshow
milo197
 
Levima - Company profile
Levima - Company profileLevima - Company profile
Levima - Company profile
Levima
 
Edinburgh University H&S presentation
Edinburgh University H&S presentationEdinburgh University H&S presentation
Edinburgh University H&S presentation
Levima
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrieval
Maria Eskevich
 
Projects list for ece & eee
Projects list for ece & eeeProjects list for ece & eee
Projects list for ece & eeevarun29071
 
Software Security Testing
Software Security TestingSoftware Security Testing
Software Security Testingsrivinayak
 
болезнь паркинсона
болезнь паркинсонаболезнь паркинсона
болезнь паркинсона
gunka_gl
 
Tugas 6 laporan laba rugi
Tugas 6 laporan laba rugiTugas 6 laporan laba rugi
Tugas 6 laporan laba rugi
Tika Evitasuhri
 

Viewers also liked (20)

Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
 
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskDCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
 
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
Towards Methods for Efficient Access to Spoken Content in the AMI Corpus (SSC...
 
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
Team leaders
Team leadersTeam leaders
Team leaders
 
Anderson Oehler 360 Presentation
Anderson Oehler 360 PresentationAnderson Oehler 360 Presentation
Anderson Oehler 360 Presentation
 
Leadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden TrainingLeadership Tips: Giving feedback from McCrudden Training
Leadership Tips: Giving feedback from McCrudden Training
 
Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback Gameplan - Management, Leadership and Cultural Dynamics Feedback
Gameplan - Management, Leadership and Cultural Dynamics Feedback
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?
 
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Photostory slideshow
Photostory slideshow Photostory slideshow
Photostory slideshow
 
Levima - Company profile
Levima - Company profileLevima - Company profile
Levima - Company profile
 
Edinburgh University H&S presentation
Edinburgh University H&S presentationEdinburgh University H&S presentation
Edinburgh University H&S presentation
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrieval
 
Projects list for ece & eee
Projects list for ece & eeeProjects list for ece & eee
Projects list for ece & eee
 
Software Security Testing
Software Security TestingSoftware Security Testing
Software Security Testing
 
болезнь паркинсона
болезнь паркинсонаболезнь паркинсона
болезнь паркинсона
 
Tugas 6 laporan laba rugi
Tugas 6 laporan laba rugiTugas 6 laporan laba rugi
Tugas 6 laporan laba rugi
 

Recently uploaded

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

  • 1. New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2 1Centre for Digital Video Processing 2 Centre for Next Generation Localisation School of Computing Dublin City University, Dublin, Ireland 3 Qatar Computing Research Institute - Qatar Foundation Doha, Qatar April, 3, 2012
  • 2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity
  • 4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news:
  • 6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Documents Diversity Broadcast news: Meetings:
  • 8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Collection
  • 9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Queries (text)
  • 10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Speech Recognition System Queries (text)
  • 11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text)
  • 12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Segments
  • 13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Segmentation Indexed Segments Indexing Segments
  • 14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Transcript (text) Information Segmentation Request Indexed Segments Indexing Segments
  • 15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Collection (audio) Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Queries Retrieval Results: Collection (audio) speech segments Automatic Automatic Speech Speech Recognition Recognition System System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Speech Retrieval Speech Retrieval Results: Collection speech segments Automatic Speech Recognition System Queries Retrieval Results: Transcript (text) textual segments Information Segmentation Request Retrieval Indexed Segments Indexing Segments
  • 18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units:
  • 20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP)
  • 21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP)
  • 22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Related Work in Speech Search Evaluation Retrieval Units: Clearly defined documents: TREC SDR: Mean Average Precision (MAP) Passages: INEX : Mean Average interpolated Precision (MAiP) Jump-in points: CLEF CL-SR: Mean Generalized Average Precision (mGAP)
  • 24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant:
  • 25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00
  • 26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average interpolated Precision (MAiP) Task: passage text retrieval. Document relevance is not counted in a binary way. Precision at rank r : fraction of retrieved number of characters that are relevant: Average interpolated Precision (AiP): average of interpolated precision scores calculated at 101 recall levels (0.00, 0.01, . . . , 1.00): 1 AiP = . iP[x] 101 x=0.00,0.01,...,1.00 Shortcomings: averaging over characters in transcript is not suitable for speech tasks
  • 27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content
  • 29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1
  • 31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval mean Generalized Average Precision (mGAP) Task: retrieval of the jump-in points in time for relevant content N 1 Distance GAP = . P[r ] · 1 − · 0.1 n Granularity r =1 Shortcomings: Does not take into account how much time the user needs to spend listening to access the relevant content
  • 32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Time Precision Oriented Metrics Motivation: Create a metric that measures both the ranking quality and the segmentation quality with respect to relevance in a single score. Reflect how far the user has to listen into the segment at a certain rank until the relevant part actually begins.
  • 34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP)
  • 35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r :
  • 37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision:
  • 38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  • 39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Precision (MASP) Segment Precision (SP[r ]) at rank r : Average Segment Precision: N 1 ASP = . SP[r ] · rel(sr ) n r =1 rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0 Difference from other metrics: the amount of relevant content is measured over time instead of text average segment precision (ASP) is calculated at the ranks of segments containing relevant content rather than fixed recall points as in MAiP
  • 40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Mean Average Segment Distance-Weighted Precision (MASDWP) Penalize ASP results as mGAP N 1 Distance ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1 n Granularity r =1
  • 41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Retrieved Segments 1 2 3 4 5 6
  • 42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ Total Len 2/3 0/5 3/4 6/6 0/2 5/10
  • 43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP Total Len MAP 0.771 2/3 1 0/5 1/2 3/4 2/3 6/6 3/4 0/2 3/5 5/10 4/6
  • 46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP Total Len MAP 0.771 2/3 1 2/3 0/5 1/2 2/8 MASP 0.557 3/4 2/3 5/12 6/6 3/4 11/18 0/2 3/5 11/20 5/10 4/6 16/30
  • 49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Comparative example of AP, ASP and ASDWP Rel Len/ AP ASP ASDWP Total Len MAP 0.771 2/3 1 2/3 2/3 * 1.0 0/5 1/2 2/8 2/8 * 0.0 MASP 0.557 3/4 2/3 5/12 5/12 * 0.9 6/6 3/4 11/18 11/18 * 0.0 MASDWP 0.260 0/2 3/5 11/20 11/20 * 0.0 5/10 4/6 16/30 16/30 * 0.0
  • 52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Test Collection Speech collection: AMI Corpus Ca. 100 hours of data (80 hours of speech) 160 meetings: average length – 30 minutes Transcript Manual Automatic Speech Recognition (ASR), WER ≈ 30 % Retrieval test set: 25 queries with text taken form PowerPoint slides provided with the AMI Corpus (avr len > 10 content words) Manual relevance assessment
  • 54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Segmentation Methods and Retrieval Runs Segmentation*: Lexical cohesion based algorithms: TextTiling, C99 Time- and length-based algorithms: time length = 60, 120, 150, 180 seconds; number of words per segment = 300, 400 Extreme case: No segmentation Retrieval system: SMART extended to use language modeling * Manual boundaries for both types of transcript
  • 55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009
  • 57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score
  • 58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience
  • 59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank
  • 60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Scores Results for 1000 retrieved documents Run asr man MAP MAiP MASP MASDWP c99 0.438 0.275 0.218 0.177 tt 0.421 0.275 0.221 0.173 len 300 0.416 0.287 0.248 0.181 len 400 0.463 0.286 0.237 0.147 time 120 0.428 0.296 0.256 0.196 time 150 0.448 0.283 0.243 0.171 time 180 0.473 0.300 0.246 0.163 time 60 0.333 0.259 0.238 0.220 one doc 0.686 0.109 0.085 0.009 one doc run: only MAP highest score, all other metrics has the lowest score − > contradict user experience time 60: the highest MASDWP rank − > shorter average length of the segments makes it easier to capture the segment closer to the jump-in point
  • 61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181
  • 62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 60/60 4 243/243 179/179 59/59 5 180/180 60/60 6 105/125 59/59 7 157/204 179/179 59/59 8 107/107 59/179 60/60 9 350/429 162/180 60/60 10 122/122 143/181 AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc
  • 63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Capturing Difference Between Segmentations Rank c99 time 180 time 60 3 179/179 (–) 60/60 (–) 4 243/243 (–) 179/179 (–) 59/59 (1) 5 180/180 (-69) 60/60 (–) 6 105/125 (20) 59/59 (-10) 7 157/204 (47) 179/179 (0) 59/59 (–) 8 107/107 (-45) 59/179 60/60 (–) 9 350/429 (47) 162/180 (-4) 60/60 (21) 10 122/122 (-11) 143/181 (–) AP: one doc > time 180 > c99 > time 60 AiP: c99 > time 180 > time 60 > one doc ASP time 180 > c99 > time 60 > one doc ASDWP c99 > time 180 > time 60 > one doc
  • 64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques
  • 66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man
  • 67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Impact of Averaging Techniques AiP: man<asr man; ASP: man>asr man AiP: man<asr man; ASP: man>asr man (relevant content moves down from higher ranks)
  • 68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Outline Speech Retrieval Speech Search Evaluation Mean Average Precision (MAP) Mean Average interpolated Precision (MAiP) mean Generalized Average Precision (mGAP) New Metrics Mean Average Segment Precision (MASP) Mean Average Segment Distance-Weighted Precision (MASDWP) Retrieval Collection Experimental Results Conclusions
  • 69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents:
  • 70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters
  • 71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Conclusions MAP and MAiP do not reflect the user experience of informally structured speech documents: MAP is appropriate for clearly defined documents MAiP works with transcript characters Introduced MASP and MASDWP: MASP: captures the amount of relevant content that appears at different ranks MASDWP: rewards runs where segmentation algorithms put boundaries closer to the relevant content and these segments are higher in the ranked list
  • 72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval Thank you for your attention!