SlideShare a Scribd company logo
1 of 18
Download to read offline
11
Phoneme Similarity Matrices to Improve
Long Audio Alignment for Automatic Subtitling
Pablo Ruiz, Aitor Álvarez, Haritz Arzelus
{pruiz,aalvarez,harzelus}@vicomtech.org
Vicomtech-IK4, Donostia/San Sebastián, Spain
www.vicomtech.org
LREC, Reykjavik 28 May 2014
22
Introduction
THE NEED
Huge subtitling demand by broadcasters, for
accessibility compliance: Automatic subtitling
is an attractive option.
TALK OUTLINE
• Alignment system for automatic subtitling
– Phone decoder
– Grapheme-to-Phoneme transcriber
– Alignment algorithm
• Alignment Results applied to Subtitling
33
Alignment approach
• Long Audio Alignment: Aligns the audio signal
with a human transcript for the audio.
• System by Bordel et al. (2012):
– Alignment with Hirschberg’s algorithm.
– Simple system, but accuracy comparable to more
complex approaches (cf. Moreno et al. 1999)
– Uses a BINARY scoring matrix to evaluate alignment
operations.
• Our system: Hirschberg’s algorithm, using
NON-BINARY scoring matrices, improving vs.
binary
44
Speech-Text Alignment System
Phone
Decoder
G2P
Hirschberg’s
Algorithm
Similarity
Matrices
Aligned Words
and Subtitles
Time-codes(TC)
Phones
Aligned Phones + TC
55
Phone Decoder
Acoustic Model (AM): HTK – Monophone (18 MFCC + Δ + ΔΔ)
Bigram Phoneme Language Model (LM)
Sources for Corpora
• Spanish: Albayzin, Multext, SAVAS (AM), newspapers (LM)
• English: TIMIT (AM), newspapers (LM)
Languages Train Test PER
Spanish ~15H (LM 45 M) ~5H 40.65%
English ~4H (LM 369M) ~1H 35.52%
66
Grapheme-to-Phoneme (G2P) Transcriptors
• Spanish: rule-based
• English: inferred from CMU Dictionary
using Phonetisaurus (WFST)
• Phone-sets and details:
https://sites.google.com/site/similaritymatrices
77
Alignment Algorithm
• Hirschberg’s algorithm for sequence alignment
–Dynamic programming, optimization to Needleman-
Wunsch, applicable to longer sequences.
–Four operations
• Insertions
• Deletions
• Substitutions
• Matches
–Each operation is evaluated with a scoring function
• Binary vs. non-binary
• Goal: Promotes aligning equal or similar, easily confusable,
phonemes. Prevents aligning dissimilar phonemes.
BINARY
Gap Penalty
Similarity Score
NonBinary
88
Speech-Text Alignment (recap)
Phone
Decoder
G2P
Hirschberg’s
Algorithm
Similarity
Matrices
Aligned Words
and Subtitles
Time-codes(TC)
Phones
Aligned Phones + TC
99
Similarity Function: Input Features and Values
• Phonological similarity: Multivalued features
weighted by salience (i.e. impact in similarity)
Feature Values (English)
Feature Salience
1010
Similarity Function: Definition (cf. Kondrak 2002)
𝜎𝑠𝑢𝑏 𝑝, 𝑞 = ( 𝐶𝑠𝑢𝑏 −𝛿 𝑝, 𝑞 − 𝑉 𝑝 − 𝑉 𝑞 ) / 100
where if 𝑝 = 𝑞 , 𝑉 𝑝 − 𝑉 𝑞 = 0
else 𝑉 𝑥 =
0 if 𝑥 is a consonan𝑡
𝐶𝑣𝑤𝑙 otherwise
𝛿 𝑝, 𝑞 = |diff 𝑝, 𝑞, 𝑓 | × salience(𝑓)
𝑓∈𝑹
𝜎𝑠𝑘𝑖𝑝 𝑝 = |𝐶𝑠𝑘𝑖𝑝 / 100|
σsub: score for substituting phoneme p with q
1
2
3
4
Csub = 3500
Cvwl = 1000
Cskip = 10005
σskip: cost of skipping
1414
Evaluation: Method
• Alignment accuracy at word level and
subtitle level compared to manual subtitles
created by professional subtitlers
Alignment Test Corpora
SPANISH ENGLISH
Words 8,774 4,732
Subtitles 1,249 471
Content Clean speech
Noisy speech
Some reference
subtitles missing
1515
Evaluation: Results at Word Level
Spanish – Word Level
seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0
BinaryBaseline 14.17 57.71 72.65 76.21 79.02
PhonologicalSimilarity 23.01 82.21 93.50 95.50 96.83
English – Word Level
seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0
BinaryBaseline 0.28 4.81 19.04 29.69 43.24
PhonologicalSimilarity 1.93 23.76 48.90 61.58 72.72
Percentage of words aligned within a given deviation range
from reference
+81 +24 +20 +19 +18
1Gain in percentage points
+1.6 +19 +30 +32 +29
+81 +24 +20 +19 +18
+1.6 +19 +30 +32 +29
1616
Evaluation: Results at Subtitle Level
Spanish – Subtitle Level
seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0
BinaryBaseline 10.57 45.08 73.26 95.12 100
PhonologicalSimilarity 18.01 66.45 87.83 98.80 100
English – Subtitle Level
seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0
BinaryBaseline 0.21 4.25 37.15 84.29 100
PhonologicalSimilarity 0.42 8.28 46.07 86.84 100
Percentage of subtitles aligned within a given deviation
range from reference
+71 +21 +14 +3.5 =
+0.2 +4 +9 +2.5 =
1Gain in percentage points
+71 +21 +14 +3.5 =
+0.2 +4 +9 +2.5 =
1717
Conclusions and Further Work
• Continuous scoring matrices based on phonological
similarity improve alignment compared to a binary
matrix.
• Further work: Other similarity criteria can also
improve results.
English – Subtitle Level
seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0
Binary 0.21 4.03 36.94 84.08 100
PhonologicalSim 0.42 8.28 43.10 85.99 100
DecoderErrorBased 0.42 11.46 48.83 88.32 100
ICASSP (May 2014), Text, Speech and Dialogue (Sept 2014)
+0.2 +4 +6 +2 =
+0.2 +7 +12 +4 =
1818
Thank you
https://sites.google.com/site/similaritymatrices

More Related Content

Similar to Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtiitling

Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationJEE HYUN PARK
 
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str..."Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...Anıl Osman Tur
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion RecognitionSeoul National University
 
Environmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsEnvironmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsTakuya Yoshioka
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesMohamed El-Geish
 
Presentation ASLIB 2014_Ghoula
Presentation ASLIB 2014_GhoulaPresentation ASLIB 2014_Ghoula
Presentation ASLIB 2014_GhoulaNizar Ghoula
 
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...Ioannis Partalas
 

Similar to Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtiitling (20)

Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
 
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str..."Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Str...
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Environmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsEnvironmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic models
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
 
Presentation ASLIB 2014_Ghoula
Presentation ASLIB 2014_GhoulaPresentation ASLIB 2014_Ghoula
Presentation ASLIB 2014_Ghoula
 
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...
Comparing Named-Entity Recognizers in a Targeted Domain: Handcrafted Rules vs...
 
Study_of_Sequence_labeling_Systems
Study_of_Sequence_labeling_SystemsStudy_of_Sequence_labeling_Systems
Study_of_Sequence_labeling_Systems
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtiitling

  • 1. 11 Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling Pablo Ruiz, Aitor Álvarez, Haritz Arzelus {pruiz,aalvarez,harzelus}@vicomtech.org Vicomtech-IK4, Donostia/San Sebastián, Spain www.vicomtech.org LREC, Reykjavik 28 May 2014
  • 2. 22 Introduction THE NEED Huge subtitling demand by broadcasters, for accessibility compliance: Automatic subtitling is an attractive option. TALK OUTLINE • Alignment system for automatic subtitling – Phone decoder – Grapheme-to-Phoneme transcriber – Alignment algorithm • Alignment Results applied to Subtitling
  • 3. 33 Alignment approach • Long Audio Alignment: Aligns the audio signal with a human transcript for the audio. • System by Bordel et al. (2012): – Alignment with Hirschberg’s algorithm. – Simple system, but accuracy comparable to more complex approaches (cf. Moreno et al. 1999) – Uses a BINARY scoring matrix to evaluate alignment operations. • Our system: Hirschberg’s algorithm, using NON-BINARY scoring matrices, improving vs. binary
  • 5. 55 Phone Decoder Acoustic Model (AM): HTK – Monophone (18 MFCC + Δ + ΔΔ) Bigram Phoneme Language Model (LM) Sources for Corpora • Spanish: Albayzin, Multext, SAVAS (AM), newspapers (LM) • English: TIMIT (AM), newspapers (LM) Languages Train Test PER Spanish ~15H (LM 45 M) ~5H 40.65% English ~4H (LM 369M) ~1H 35.52%
  • 6. 66 Grapheme-to-Phoneme (G2P) Transcriptors • Spanish: rule-based • English: inferred from CMU Dictionary using Phonetisaurus (WFST) • Phone-sets and details: https://sites.google.com/site/similaritymatrices
  • 7. 77 Alignment Algorithm • Hirschberg’s algorithm for sequence alignment –Dynamic programming, optimization to Needleman- Wunsch, applicable to longer sequences. –Four operations • Insertions • Deletions • Substitutions • Matches –Each operation is evaluated with a scoring function • Binary vs. non-binary • Goal: Promotes aligning equal or similar, easily confusable, phonemes. Prevents aligning dissimilar phonemes. BINARY Gap Penalty Similarity Score NonBinary
  • 9. 99 Similarity Function: Input Features and Values • Phonological similarity: Multivalued features weighted by salience (i.e. impact in similarity) Feature Values (English) Feature Salience
  • 10. 1010 Similarity Function: Definition (cf. Kondrak 2002) 𝜎𝑠𝑢𝑏 𝑝, 𝑞 = ( 𝐶𝑠𝑢𝑏 −𝛿 𝑝, 𝑞 − 𝑉 𝑝 − 𝑉 𝑞 ) / 100 where if 𝑝 = 𝑞 , 𝑉 𝑝 − 𝑉 𝑞 = 0 else 𝑉 𝑥 = 0 if 𝑥 is a consonan𝑡 𝐶𝑣𝑤𝑙 otherwise 𝛿 𝑝, 𝑞 = |diff 𝑝, 𝑞, 𝑓 | × salience(𝑓) 𝑓∈𝑹 𝜎𝑠𝑘𝑖𝑝 𝑝 = |𝐶𝑠𝑘𝑖𝑝 / 100| σsub: score for substituting phoneme p with q 1 2 3 4 Csub = 3500 Cvwl = 1000 Cskip = 10005 σskip: cost of skipping
  • 11.
  • 12.
  • 13.
  • 14. 1414 Evaluation: Method • Alignment accuracy at word level and subtitle level compared to manual subtitles created by professional subtitlers Alignment Test Corpora SPANISH ENGLISH Words 8,774 4,732 Subtitles 1,249 471 Content Clean speech Noisy speech Some reference subtitles missing
  • 15. 1515 Evaluation: Results at Word Level Spanish – Word Level seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0 BinaryBaseline 14.17 57.71 72.65 76.21 79.02 PhonologicalSimilarity 23.01 82.21 93.50 95.50 96.83 English – Word Level seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0 BinaryBaseline 0.28 4.81 19.04 29.69 43.24 PhonologicalSimilarity 1.93 23.76 48.90 61.58 72.72 Percentage of words aligned within a given deviation range from reference +81 +24 +20 +19 +18 1Gain in percentage points +1.6 +19 +30 +32 +29 +81 +24 +20 +19 +18 +1.6 +19 +30 +32 +29
  • 16. 1616 Evaluation: Results at Subtitle Level Spanish – Subtitle Level seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0 BinaryBaseline 10.57 45.08 73.26 95.12 100 PhonologicalSimilarity 18.01 66.45 87.83 98.80 100 English – Subtitle Level seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0 BinaryBaseline 0.21 4.25 37.15 84.29 100 PhonologicalSimilarity 0.42 8.28 46.07 86.84 100 Percentage of subtitles aligned within a given deviation range from reference +71 +21 +14 +3.5 = +0.2 +4 +9 +2.5 = 1Gain in percentage points +71 +21 +14 +3.5 = +0.2 +4 +9 +2.5 =
  • 17. 1717 Conclusions and Further Work • Continuous scoring matrices based on phonological similarity improve alignment compared to a binary matrix. • Further work: Other similarity criteria can also improve results. English – Subtitle Level seconds 0 ≤0.1 ≤0.5 ≤1.0 ≤2.0 Binary 0.21 4.03 36.94 84.08 100 PhonologicalSim 0.42 8.28 43.10 85.99 100 DecoderErrorBased 0.42 11.46 48.83 88.32 100 ICASSP (May 2014), Text, Speech and Dialogue (Sept 2014) +0.2 +4 +6 +2 = +0.2 +7 +12 +4 =