SlideShare a Scribd company logo
Penalty Functions for Evaluation Measures
of Unsegmented Speech Retrieval
Petra Galuščáková, Pavel Pecina, Jan Hajič
Institute of Formal and Applied Linguistics
Charles University in Prague
{galuscakova,pecina,hajic}@ufal.mff.cuni.cz
Motivation
● Speech Retrieval
● Retrieving information from a collection of audio data in
response to a given query
– modality of the query could be arbitrary, either text or speech
● Usually solved as text retrieval on transcriptions of the audio
obtained by ASR
● But: speech transcriptions are not 100% accurate, vocabulary is
different, speech contains additional elements speech is usually
not segmented into topically coherent passages
→ special evaluation methods for speech retrieval
are needed
Evaluation of speech retrieval I
● Known Segments Boundaries
● Speech collection is segmented to passages which can play the
role of documents
● Precision/Recall
● Average Precision
– arithmetic mean of the values of precision for the set of first
most relevant retrieved documents
● Mean Average Precision
– arithmetic mean of the AP values for the set of the queries
Evaluation of speech retrieval II
● Unknown Boudaries
● No topical segmentation, the system is expected to retrieve
exact starting points for each query
● Mean Average Segment Precision
– recently introduced, used in MediaEval
– designed for evaluation of retrieval of relevant document
parts
● Mean Generalized Average Precision
– designed to allow certain tolerance in matching search
results against a gold standard relevance assessment
– tolerance is determined by the Penalty Function
Evaluation of speech retrieval
- mGAP score
N = number of assessed starting points
Rk= reward calculated according to the Penalty Function
pk is the value of Precision for the position k calculated as:
mGAP = arithmetic mean of the GAP values for the set of the queries
GAP=
∑
Rk ≠0
pk
N
pk =
∑
i=1
k
Ri
k
Evaluation of speech retrieval
- mGAP score
● Time difference between the starting point of the topic
determined by the system and the true starting point of this
topic obtained during relevance assessment
● The actual shape of the function can be chosen arbitrarily
● The Penalty Function used in the mGAP measure in the Cross-
Language Speech Retrieval Track of CLEF 2006 and 2007
Evaluation of speech retrieval
- mGAP score
● Has been widely used, however, the measure (and the Penalty
Function itself) have not been adequately studied
● Questions:
● the Penalty Function is symmetrical and starting points
retrieved by a system in the same distance before and after
a true starting point are treated as equally good (or bad)
– “shape” of the function itself
● “width” of the Penalty Function, i.e. the maximum distance
for which the reward is non-zero
Penalty Function Proposal
Methodology
● Lab test to study the behaviour of users
● IR system simulation
● Users were presented the topics from the test collection and
playback points randomly generated in a vicinity of a starting point
of a relevant segment
● Users should have navigated in the recording and indicate when the
speaker started to talk about the given topic
● After they found the relevant segment, the participants were asked
to indicate their satisfaction with the playback point
Number of participants 24
Number of processed starting points 263
Data
● Test collection used for Cross-Language Speech-Retrieval track of
CLEF 2007
● Manually processed by human assessors – relevant passages for
given topics were identified
● Part of oral history archive from the Malach Project (Holocaust
testimonies)
● Recorded in Czech
Recordings in Malach Project 52 000
Czech recordigs in Malach Project 700
Assessed Czech recordings 357
Average length of the recording 95 min
Processed topics 116
Results
Time analysis
● We measure the elapsed time between the beginning of playback
and the moment when the participant presses the button indicating
that the relevant passage was found
● Respondents generally need less time to complete the task when
the playback point is located before the true starting point
Users’ satisfaction
● Participants were requested to indicate to what extent they were
happy with the location of the playback points in the scale of: very
good, good, bad or very bad
● Trend not clear - most satisfied when the playback reference point
lies shortly before the true starting point but function value
decreases more slowly for positive time
Proposed mGAP
Modifications
1) Users prefer playback points appearing before the beginning of
a true relevant passages to those appearing after, i.e. more
reward should be given to playback points appearing before the
true starting point of a relevant segment
2) Users are tolerant to playback points appearing within a 1-
minute distance from the true starting points. i.e. equal
(maximum) reward should be given to all playback points which
are closer than one minute to the true starting point.
3) Users are still satisfied when playback points appear in two- or
three- minute distance from the true starting point. i.e. function
should be “wider”.
Proposed mGAP
Modifications
Comparison with the Original
Measure
● Outputs of CLEF 2007 Cross-Language Speech Retrieval Track
● 15 retrieval system scored with the original and proposed
Penalty functions
● High correlation
Conclusion
Conclusion
● We described evaluation of speech retrieval (segmented/not
segmented)
● Described mGAP, penalty function drawbacks
● We organized human-based lab test
● Based on lab test results we modified Penalty Function
● Finally compared modified Penalty Function with the original
function
Thank you

More Related Content

Viewers also liked

CV-Mohsin
CV-MohsinCV-Mohsin
CV-Mohsin
Mohsin Sarder
 
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Petra Galuscakova
 
Unit01 bac
Unit01 bacUnit01 bac
Unit01 bac
bahamou11
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
Aseel K. Mahmood
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Seminar Links
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
Petra Galuscakova
 

Viewers also liked (7)

CV-Mohsin
CV-MohsinCV-Mohsin
CV-Mohsin
 
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
Unit01 bac
Unit01 bacUnit01 bac
Unit01 bac
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
 

Similar to Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval

Using EEG when usability testing
Using EEG when usability testingUsing EEG when usability testing
Using EEG when usability testing
Caroline Jarrett
 
Human Computer Interaction Evaluation
Human Computer Interaction EvaluationHuman Computer Interaction Evaluation
Human Computer Interaction Evaluation
LGS, GBHS&IC, University Of South-Asia, TARA-Technologies
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
vrgokila
 
E3 chap-09
E3 chap-09E3 chap-09
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
Yuki Saito
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
Welly Dian Astika
 
5. bleu
5. bleu5. bleu
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
Lifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
Evaluation techniques
Evaluation techniquesEvaluation techniques
Evaluation techniques
PhD Research Scholar
 
OSCE.pptx
OSCE.pptxOSCE.pptx
e3-chap-09.ppt
e3-chap-09.ppte3-chap-09.ppt
e3-chap-09.ppt
KingSh2
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docxECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
jenkinsmandie
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
Robert Martin
 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Maps
ivaderivader
 
CS571: Introduction
CS571: IntroductionCS571: Introduction
CS571: Introduction
Jinho Choi
 
Learning
LearningLearning
Learning
Amar Jukuntla
 
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptxTYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
Lilaaa3
 

Similar to Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval (20)

Using EEG when usability testing
Using EEG when usability testingUsing EEG when usability testing
Using EEG when usability testing
 
Human Computer Interaction Evaluation
Human Computer Interaction EvaluationHuman Computer Interaction Evaluation
Human Computer Interaction Evaluation
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
5. bleu
5. bleu5. bleu
5. bleu
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Evaluation techniques
Evaluation techniquesEvaluation techniques
Evaluation techniques
 
OSCE.pptx
OSCE.pptxOSCE.pptx
OSCE.pptx
 
e3-chap-09.ppt
e3-chap-09.ppte3-chap-09.ppt
e3-chap-09.ppt
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docxECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Maps
 
CS571: Introduction
CS571: IntroductionCS571: Introduction
CS571: Introduction
 
Learning
LearningLearning
Learning
 
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptxTYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
TYPES OF VARIOUS SAMPLING IN STATISTICS.pptx
 

Recently uploaded

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval

  • 1. Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval Petra Galuščáková, Pavel Pecina, Jan Hajič Institute of Formal and Applied Linguistics Charles University in Prague {galuscakova,pecina,hajic}@ufal.mff.cuni.cz
  • 2. Motivation ● Speech Retrieval ● Retrieving information from a collection of audio data in response to a given query – modality of the query could be arbitrary, either text or speech ● Usually solved as text retrieval on transcriptions of the audio obtained by ASR ● But: speech transcriptions are not 100% accurate, vocabulary is different, speech contains additional elements speech is usually not segmented into topically coherent passages → special evaluation methods for speech retrieval are needed
  • 3. Evaluation of speech retrieval I ● Known Segments Boundaries ● Speech collection is segmented to passages which can play the role of documents ● Precision/Recall ● Average Precision – arithmetic mean of the values of precision for the set of first most relevant retrieved documents ● Mean Average Precision – arithmetic mean of the AP values for the set of the queries
  • 4. Evaluation of speech retrieval II ● Unknown Boudaries ● No topical segmentation, the system is expected to retrieve exact starting points for each query ● Mean Average Segment Precision – recently introduced, used in MediaEval – designed for evaluation of retrieval of relevant document parts ● Mean Generalized Average Precision – designed to allow certain tolerance in matching search results against a gold standard relevance assessment – tolerance is determined by the Penalty Function
  • 5. Evaluation of speech retrieval - mGAP score N = number of assessed starting points Rk= reward calculated according to the Penalty Function pk is the value of Precision for the position k calculated as: mGAP = arithmetic mean of the GAP values for the set of the queries GAP= ∑ Rk ≠0 pk N pk = ∑ i=1 k Ri k
  • 6. Evaluation of speech retrieval - mGAP score ● Time difference between the starting point of the topic determined by the system and the true starting point of this topic obtained during relevance assessment ● The actual shape of the function can be chosen arbitrarily ● The Penalty Function used in the mGAP measure in the Cross- Language Speech Retrieval Track of CLEF 2006 and 2007
  • 7. Evaluation of speech retrieval - mGAP score ● Has been widely used, however, the measure (and the Penalty Function itself) have not been adequately studied ● Questions: ● the Penalty Function is symmetrical and starting points retrieved by a system in the same distance before and after a true starting point are treated as equally good (or bad) – “shape” of the function itself ● “width” of the Penalty Function, i.e. the maximum distance for which the reward is non-zero
  • 9. Methodology ● Lab test to study the behaviour of users ● IR system simulation ● Users were presented the topics from the test collection and playback points randomly generated in a vicinity of a starting point of a relevant segment ● Users should have navigated in the recording and indicate when the speaker started to talk about the given topic ● After they found the relevant segment, the participants were asked to indicate their satisfaction with the playback point Number of participants 24 Number of processed starting points 263
  • 10.
  • 11. Data ● Test collection used for Cross-Language Speech-Retrieval track of CLEF 2007 ● Manually processed by human assessors – relevant passages for given topics were identified ● Part of oral history archive from the Malach Project (Holocaust testimonies) ● Recorded in Czech Recordings in Malach Project 52 000 Czech recordigs in Malach Project 700 Assessed Czech recordings 357 Average length of the recording 95 min Processed topics 116
  • 13. Time analysis ● We measure the elapsed time between the beginning of playback and the moment when the participant presses the button indicating that the relevant passage was found ● Respondents generally need less time to complete the task when the playback point is located before the true starting point
  • 14. Users’ satisfaction ● Participants were requested to indicate to what extent they were happy with the location of the playback points in the scale of: very good, good, bad or very bad ● Trend not clear - most satisfied when the playback reference point lies shortly before the true starting point but function value decreases more slowly for positive time
  • 15. Proposed mGAP Modifications 1) Users prefer playback points appearing before the beginning of a true relevant passages to those appearing after, i.e. more reward should be given to playback points appearing before the true starting point of a relevant segment 2) Users are tolerant to playback points appearing within a 1- minute distance from the true starting points. i.e. equal (maximum) reward should be given to all playback points which are closer than one minute to the true starting point. 3) Users are still satisfied when playback points appear in two- or three- minute distance from the true starting point. i.e. function should be “wider”.
  • 17. Comparison with the Original Measure ● Outputs of CLEF 2007 Cross-Language Speech Retrieval Track ● 15 retrieval system scored with the original and proposed Penalty functions ● High correlation
  • 19. Conclusion ● We described evaluation of speech retrieval (segmented/not segmented) ● Described mGAP, penalty function drawbacks ● We organized human-based lab test ● Based on lab test results we modified Penalty Function ● Finally compared modified Penalty Function with the original function