We introduce two new metrics for the evaluation of search effectiveness for informally structured speech data: mean average segment precision (MASP) which measures retrieval performance in terms of both content segmentation and ranking with respect to relevance; and mean average segment distance-weighted precision (MASDWP) which takes into account the distance between the start of the relevant segment and the retrieved segment. We demonstrate the effectiveness of these new metrics on a retrieval test collection based on the AMI meeting corpus.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
Inverted File Based Search Technique for Video Copy Retrievalijcsa
A video copy detection system is a content-based search engine focusing on Spatio-temporal features. It
aims to find whether a query video segment is a copy of video from the video database or not based on the
signature of the video. It is hard to find whether a video is a copied video or a similar video since the
features of the content are very similar from one video to the other. The main focus is to detect that the
query video is present in the video database with robustness depending on the content of video and also by
fast search of fingerprints. The Fingerprint Extraction Algorithm and Fast Search Algorithm are adopted
to achieve robust, fast, efficient and accurate video copy detection. As a first step, the Fingerprint
Extraction algorithm is employed which extracts a fingerprint through the features from the image content
of video. The images are represented as Temporally Informative Representative Images (TIRI). Then the
next step is to find the presence of copy of a query video in a video database, in which a close match of its
fingerprint in the corresponding fingerprint database is searched using inverted-file-based method.
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Maria Eskevich
We present an exploratory study of the retrieval of semi-professional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics.
Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskMaria Eskevich
We describe details of our runs and the results obtained for the "IR for Spoken Documents (SpokenDoc) Task'' at NTCIR-9. The focus of our participation in this task was the investigation of the use of segmentation methods to divide the manual and ASR transcripts into topically coherent segments. The underlying assumption of this approach is that these segments will capture passages in the transcript relevant to the query. Our experiments investigate the use of two lexical coherence based segmentation algorithms (TextTiling, C99). These are run on the provided manual and ASR transcripts, and the ASR transcript with stop words removed. Evaluation of the results shows that TextTiling consistently performs better than C99 both in segmenting the data into retrieval units as evaluated using the centre located relevant information metric and in having higher content precision in each automatically created segment.
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Maria Eskevich
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focus on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach is used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing
this test collection using the Amazon Mechanical Turk platform.We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.
Search and Hyperlinking Task at MediaEval 2012Maria Eskevich
The Search and Hyperlinking Task was one of the Brave New Tasks at MediaEval 2012. The Task consisted of two sub-tasks which focused on search and linking in retrieval from a collection of semi-professional video content. These tasks followed up on research carried out within the MediaEval 2011 Rich Speech Retrieval (RSR) Task and the VideoCLEF 2009 Linking Task.
Giving effective feedback is a key management tool for team leaders. This short slide deck gives insight, advice and tips on how to give praise and positive feedback that motivates others.
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Hitesh Gossain
The easiest step by step process to follow to create that winning presentation to source sponsorship for your next event.
This is derived from Onspon's online platform where listing your event is complimentary and gets immediate access to multiple sponsors who check this platform for the next event fitting their brand needs.
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Maria Eskevich
We present an exploratory study of the retrieval of semi-professional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics.
Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskMaria Eskevich
We describe details of our runs and the results obtained for the "IR for Spoken Documents (SpokenDoc) Task'' at NTCIR-9. The focus of our participation in this task was the investigation of the use of segmentation methods to divide the manual and ASR transcripts into topically coherent segments. The underlying assumption of this approach is that these segments will capture passages in the transcript relevant to the query. Our experiments investigate the use of two lexical coherence based segmentation algorithms (TextTiling, C99). These are run on the provided manual and ASR transcripts, and the ASR transcript with stop words removed. Evaluation of the results shows that TextTiling consistently performs better than C99 both in segmenting the data into retrieval units as evaluated using the centre located relevant information metric and in having higher content precision in each automatically created segment.
Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)Maria Eskevich
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focus on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach is used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing
this test collection using the Amazon Mechanical Turk platform.We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.
Search and Hyperlinking Task at MediaEval 2012Maria Eskevich
The Search and Hyperlinking Task was one of the Brave New Tasks at MediaEval 2012. The Task consisted of two sub-tasks which focused on search and linking in retrieval from a collection of semi-professional video content. These tasks followed up on research carried out within the MediaEval 2011 Rich Speech Retrieval (RSR) Task and the VideoCLEF 2009 Linking Task.
Giving effective feedback is a key management tool for team leaders. This short slide deck gives insight, advice and tips on how to give praise and positive feedback that motivates others.
Getting Sponsorship - Standard Presentation (Created by Onspon.com - India's ...Hitesh Gossain
The easiest step by step process to follow to create that winning presentation to source sponsorship for your next event.
This is derived from Onspon's online platform where listing your event is complimentary and gets immediate access to multiple sponsors who check this platform for the next event fitting their brand needs.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)
1. New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
New Metrics for Meaningful Evaluation of
Informally Structured Speech Retrieval
Maria Eskevich1 , Walid Magdy2,3 , Gareth J.F. Jones1,2
1Centre for Digital Video Processing
2 Centre for Next Generation Localisation
School of Computing
Dublin City University, Dublin, Ireland
3 Qatar Computing Research Institute - Qatar Foundation
Doha, Qatar
April, 3, 2012
2. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
3. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
4. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
5. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
6. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
Meetings:
7. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Documents Diversity
Broadcast news:
Meetings:
8. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech
Collection
9. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Queries
(text)
10. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic
Speech
Recognition
System
Queries
(text)
11. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
12. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Segmentation
Segments
13. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Segmentation
Indexed
Segments
Indexing Segments
14. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries
Transcript
(text)
Information
Segmentation
Request
Indexed
Segments
Indexing Segments
15. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries
Collection (audio)
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
16. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Queries Retrieval Results:
Collection (audio) speech segments
Automatic Automatic
Speech Speech
Recognition Recognition
System System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
17. Speech Retrieval New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Speech Retrieval
Speech Retrieval Results:
Collection speech segments
Automatic
Speech
Recognition
System
Queries Retrieval Results:
Transcript
(text) textual segments
Information
Segmentation
Request
Retrieval
Indexed
Segments
Indexing Segments
18. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
19. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
20. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
21. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
22. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
Jump-in points:
CLEF CL-SR: Mean Generalized Average Precision
(mGAP)
23. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Related Work in Speech Search Evaluation
Retrieval Units:
Clearly defined documents:
TREC SDR: Mean Average Precision (MAP)
Passages:
INEX : Mean Average interpolated Precision (MAiP)
Jump-in points:
CLEF CL-SR: Mean Generalized Average Precision
(mGAP)
24. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
25. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
Average interpolated Precision (AiP): average of interpolated
precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
1.00):
1
AiP = . iP[x]
101
x=0.00,0.01,...,1.00
26. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average interpolated Precision (MAiP)
Task: passage text retrieval.
Document relevance is not counted in a binary way.
Precision at rank r : fraction of retrieved number of characters
that are relevant:
Average interpolated Precision (AiP): average of interpolated
precision scores calculated at 101 recall levels (0.00, 0.01, . . . ,
1.00):
1
AiP = . iP[x]
101
x=0.00,0.01,...,1.00
Shortcomings: averaging over characters in transcript is
not suitable for speech tasks
27. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
28. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
29. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
30. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
31. Speech Search Evaluation New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
mean Generalized Average Precision (mGAP)
Task: retrieval of the jump-in points in time for relevant content
N
1 Distance
GAP = . P[r ] · 1 − · 0.1
n Granularity
r =1
Shortcomings: Does not take into account
how much time the user needs to spend listening
to access the relevant content
32. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
33. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Time Precision Oriented Metrics
Motivation:
Create a metric that measures both the ranking quality and
the segmentation quality with respect to relevance in a
single score.
Reflect how far the user has to listen into the segment at a
certain rank until the relevant part actually begins.
34. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
35. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
36. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
37. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
38. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
N
1
ASP = . SP[r ] · rel(sr )
n
r =1
rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
39. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Precision (MASP)
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
N
1
ASP = . SP[r ] · rel(sr )
n
r =1
rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
Difference from other metrics:
the amount of relevant content is measured over time
instead of text
average segment precision (ASP) is calculated at the
ranks of segments containing relevant content
rather than fixed recall points as in MAiP
40. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Penalize ASP results as mGAP
N
1 Distance
ASDWP = . SP[r ] · rel(sr ) · 1 − · 0.1
n Granularity
r =1
41. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Retrieved
Segments
1
2
3
4
5
6
42. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
Total Len
2/3
0/5
3/4
6/6
0/2
5/10
43. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
44. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
45. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP
Total Len MAP
0.771
2/3 1
0/5 1/2
3/4 2/3
6/6 3/4
0/2 3/5
5/10 4/6
46. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
47. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
48. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP
Total Len MAP
0.771
2/3 1 2/3
0/5 1/2 2/8
MASP
0.557
3/4 2/3 5/12
6/6 3/4 11/18
0/2 3/5 11/20
5/10 4/6 16/30
49. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
50. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
51. New Metrics New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Comparative example of AP, ASP and ASDWP
Rel Len/
AP ASP ASDWP
Total Len MAP
0.771
2/3 1 2/3 2/3 * 1.0
0/5 1/2 2/8 2/8 * 0.0
MASP
0.557
3/4 2/3 5/12 5/12 * 0.9
6/6 3/4 11/18 11/18 * 0.0
MASDWP
0.260
0/2 3/5 11/20 11/20 * 0.0
5/10 4/6 16/30 16/30 * 0.0
52. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
53. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Test Collection
Speech collection: AMI Corpus
Ca. 100 hours of data (80 hours of speech)
160 meetings:
average length – 30 minutes
Transcript
Manual
Automatic Speech Recognition (ASR), WER ≈ 30 %
Retrieval test set:
25 queries with text taken form PowerPoint slides provided
with the AMI Corpus (avr len > 10 content words)
Manual relevance assessment
54. Retrieval Collection New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Segmentation Methods and Retrieval Runs
Segmentation*:
Lexical cohesion based algorithms: TextTiling, C99
Time- and length-based algorithms:
time length = 60, 120, 150, 180 seconds;
number of words per segment = 300, 400
Extreme case: No segmentation
Retrieval system:
SMART extended to use language modeling
* Manual boundaries for both types of transcript
55. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
56. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
57. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score
58. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
59. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
time 60: the highest MASDWP rank
60. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Scores Results for 1000 retrieved documents
Run asr man
MAP MAiP MASP MASDWP
c99 0.438 0.275 0.218 0.177
tt 0.421 0.275 0.221 0.173
len 300 0.416 0.287 0.248 0.181
len 400 0.463 0.286 0.237 0.147
time 120 0.428 0.296 0.256 0.196
time 150 0.448 0.283 0.243 0.171
time 180 0.473 0.300 0.246 0.163
time 60 0.333 0.259 0.238 0.220
one doc 0.686 0.109 0.085 0.009
one doc run: only MAP highest score, all other metrics
has the lowest score − > contradict user experience
time 60: the highest MASDWP rank − > shorter average
length of the segments makes it easier to capture
the segment closer to the jump-in point
61. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 60/60
4 243/243 179/179 59/59
5 180/180 60/60
6 105/125 59/59
7 157/204 179/179 59/59
8 107/107 59/179 60/60
9 350/429 162/180 60/60
10 122/122 143/181
62. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 60/60
4 243/243 179/179 59/59
5 180/180 60/60
6 105/125 59/59
7 157/204 179/179 59/59
8 107/107 59/179 60/60
9 350/429 162/180 60/60
10 122/122 143/181
AP: one doc > time 180 > c99 > time 60
AiP: c99 > time 180 > time 60 > one doc
ASP time 180 > c99 > time 60 > one doc
63. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Capturing Difference Between Segmentations
Rank c99 time 180 time 60
3 179/179 (–) 60/60 (–)
4 243/243 (–) 179/179 (–) 59/59 (1)
5 180/180 (-69) 60/60 (–)
6 105/125 (20) 59/59 (-10)
7 157/204 (47) 179/179 (0) 59/59 (–)
8 107/107 (-45) 59/179 60/60 (–)
9 350/429 (47) 162/180 (-4) 60/60 (21)
10 122/122 (-11) 143/181 (–)
AP: one doc > time 180 > c99 > time 60
AiP: c99 > time 180 > time 60 > one doc
ASP time 180 > c99 > time 60 > one doc
ASDWP c99 > time 180 > time 60 > one doc
64. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
65. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
66. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
AiP: man<asr man; ASP: man>asr man
67. Experimental Results New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Impact of Averaging Techniques
AiP: man<asr man; ASP: man>asr man
AiP: man<asr man; ASP: man>asr man
(relevant content moves down from higher ranks)
68. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Outline
Speech Retrieval
Speech Search Evaluation
Mean Average Precision (MAP)
Mean Average interpolated Precision (MAiP)
mean Generalized Average Precision (mGAP)
New Metrics
Mean Average Segment Precision (MASP)
Mean Average Segment Distance-Weighted Precision
(MASDWP)
Retrieval Collection
Experimental Results
Conclusions
69. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
70. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
MAP is appropriate for clearly defined documents
MAiP works with transcript characters
71. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Conclusions
MAP and MAiP do not reflect the user experience of informally
structured speech documents:
MAP is appropriate for clearly defined documents
MAiP works with transcript characters
Introduced MASP and MASDWP:
MASP: captures the amount of relevant content that
appears at different ranks
MASDWP: rewards runs where segmentation algorithms
put boundaries closer to the relevant content and these
segments are higher in the ranked list
72. Conclusions New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval
Thank you for your attention!