SlideShare a Scribd company logo
1 of 49
Language Variation in Parliamentary
Speeches
First Steps Towards Robust Phoneme Recognition
Why?
2
● General speech science goals: how do people speak?
● Path towards better training materials, iterative training
processes
● SweTerror: multidisciplinary investigation into parliamentary
discourse around the topic of terrorism
How much data?
2019-04-09
September 2012 to January 2022:
5925 hours of raw video
Where does “terror” occur?
2019-04-09
Transcript files 826
Video files (ASR) 1018
Words (transcripts) 6741
Words (ASR) 7227
Top 10: transcripts
2019-04-09
terrorism 1605
terrorister 573
terrorismen 294
terror 273
terrordåd 211
terroristbrott 156
terrorhot 138
terrorattentat 134
terrorbrott 131
terrororganisationer 127
Top 10: ASR
2019-04-09
terrorism 1608
terrorister 524
terrorismen 353
terror 291
terrordåd 198
terroristbrott 166
terrorbrott 144
terrorhot 135
terrorattentat 133
terrororganisationer 120
Mundane issues
2019-04-09
Files cut off: [Prövning] av förslag till
2019-04-09
Normalisation: enligt 11 kap. 3 §
2019-04-09
ASR error: IX debatt
2019-04-09
ASR error: tack för talman
2019-04-09
Normalisation: solved (mostly)
2019-04-09
Normalisation
2019-04-09
Bakhturina, E., Zhang, Y., Ginsburg, B. (2022) Shallow Fusion
of Weighted Finite-State Transducer and Language Model for
Text Normalization. Proc. Interspeech 2022, 491-495, doi:
10.21437/Interspeech.2022-11074
https://github.com/NVIDIA/NeMo-text-processing
Normalisation
2019-04-09
Can we trust the transcripts?
2019-04-09
ASR model:
2019-04-09
Malmsten, M., Haffenden, C., & Börjeson, L. (2022). Hearing
voices at the National Library -- a speech corpus and acoustic
model for the Swedish language. http://arxiv.org/abs/2205.03026
https://huggingface.co/KBLab/wav2vec2-large-voxrex-swedish
Some things were never actually said
2019-04-09
2442203180006309721 1 230.46 0.06 Jag 1.0 Jag cor
2442203180006309721 1 230.6 0.08 har 1.0 har cor
2442203180006309721 1 230.76 0.2 flera 1.0 flera cor
2442203180006309721 1 231.04 0.3 kollegor 1.0 kollegor ins
2442203180006309721 1 231.4 0.159 här 1.0 <eps> sub
2442203180006309721 1 231.76 0.02 i 1.0 i cor
2442203180006309721 1 232.08 0.399 kammaren 1.0 kammaren cor
2442203180006309721 1 232.479 0.0 <eps> 1.0 som del
2442203180006309721 1 232.479 0.0 <eps> 1.0 inte del
2442203180006309721 1 232.479 0.0 <eps> 1.0 kommer del
2442203180006309721 1 232.479 0.0 <eps> 1.0 från del
2442203180006309721 1 232.479 0.0 <eps> 1.0 Stockholm. del
Phrases move
2019-04-09
2442203180006309721 1 1596.22 0.099 ett 1.0 ett cor
2442203180006309721 1 1596.38 0.199 lopp 1.0 lopp cor
2442203180006309721 1 1596.579 0.0 <eps> 1.0 på del
2442203180006309721 1 1596.579 0.0 <eps> 1.0 60 del
2442203180006309721 1 1596.579 0.0 <eps> 1.0 mil del
2442203180006309721 1 1596.72 0.16 över 1.0 över cor
2442203180006309721 1 1597.0 0.119 tre 1.0 tre cor
2442203180006309721 1 1597.32 0.279 dagar 1.0 <eps> ins
2442203180006309721 1 1597.7 0.059 på 1.0 <eps> ins
2442203180006309721 1 1597.9 0.379 sextio 1.0 <eps> ins
2442203180006309721 1 1598.36 0.32 mil 1.0 dagar. sub
Things are added in the moment
2019-04-09
2442203180006309721 1 2324.96 0.039 vi 1.0 vi cor
2442203180006309721 1 2325.1 0.32 måste 1.0 <eps> ins
2442203180006309721 1 2325.52 0.159 höra 1.0 <eps> ins
2442203180006309721 1 2325.76 0.319 talas 1.0 <eps> ins
2442203180006309721 1 2326.12 0.039 om 1.0 <eps> ins
2442203180006309721 1 2326.22 0.08 den 1.0 <eps> ins
2442203180006309721 1 2326.34 0.099 här 1.0 <eps> ins
2442203180006309721 1 2326.48 0.5 historien 1.0 <eps> ins
2442203180006309721 1 2327.14 0.34 gång 1.0 gång cor
2442203180006309721 1 2327.58 0.039 på 1.0 på cor
What can we find?
2019-04-09
False starts:
2019-04-09 21
2442207180019978121 1 1619.5 1.579 oppooppositionens 1.0
oppositionens sub
2442207150019764521 1 213.3 0.839 bberor
1.0 beror sub
2442207160019915621 1 492.7 2.0 globalisglobaliseringen 1.0
globaliseringen sub
Alternative pronunciations:
2019-04-09 22
2442207160019915621 1 398.3 0.459 resvasion 1.0
reservation sub
2442207160019915621 1 432.52 0.48 resovationen 1.0
reservationen sub
Filled pauses:
2019-04-09 23
2442207150019764521 1 622.44 0.159 ifrån 1.0 från
sub
2442207180020109821 1 326.1 0.099 nhär 1.0 här ins
Wait? Didn’t OpenAI Whisper solve ASR?
2019-04-09 24
Untrustworthy for our purposes
2019-04-09 25
Radford, A., Kim, J.W., Xu, T., Brockman, G., Mcleavey, C. & Sutskever, I.. (2023). Robust Speech Recognition via Large-Scale
Weak Supervision, in Proceedings of Machine Learning Research 202:28492-28518 Available from
https://proceedings.mlr.press/v202/radford23a.html.
Untrustworthy for our purposes
2019-04-09 26
(We can safely assume that this includes Riksdag’s data)
Disappearing “tack”
2019-04-09 27
2442207060018256921 1 13.96 0.199 tack 1.0 <eps> ins
2442207060018256921 1 14.22 0.059 Herr 1.0 Herr cor
2442207060018256921 1 14.36 0.38 talman! 1.0 talman! cor
(The official transcripts only start with “Talman!” “Herr talman!” or “Fru
talman!”)
Curious insertions
2019-04-09 28
00:00.000 --> 00:30.000
Tack till mina supporters via www.patreon.com
00:30.000 --> 00:34.000
Tack till mina supporters via www.patreon.com
01:00.000 --> 01:04.000
Tack till mina supporters via www.patreon.com
01:30.000 --> 01:34.000
Tack till mina supporters via www.patreon.com
02:00.000 --> 02:04.000
Tack till mina supporters via www.patreon.com
02:30.000 --> 02:34.000
Tack till mina supporters via www.patreon.com
03:00.000 --> 03:04.000
Tack till mina supporters via www.patreon.com
03:30.000 --> 03:34.000
Tack till mina supporters via www.patreon.com
Alternative? Phonemic recognition
2019-04-09 29
Vaxholm
30
Vaxholm
31
Vaxholm
32
Waxholm
33
Waxholm
34
A dialogue system that gave information on shipping in the Stockholm
archipelago
Incorporating text-to-speech, ASR, face synthesis, and dialog management
However: in the earliest versions, ASR was unavailable, so a Wizard of Oz setup
was used. The data from these sessions was transcribed at the word and
phoneme level, including non-speech events.
Waxholm
35
Waxholm
36
Original data
37
CORRECTED: OK jesper Jesper Hogberg Thu Jun 22 13:46:18 EET 1995
AUTOLABEL: jesper Jesper H|gberg Fri Nov 26 09:25:05 MET 1993
Waxholm dialog. /u/wax/data/scenes/fp2008/fp2008.4.04.smp
WIZARD: joakim_g Joakim Gustafson Wed Nov 24 10:47:29 MET 1993
TEXT:
jag vill }ka fr}n str|mkajen .
PHONEME: J'A:G+ V'IL+ "]:K'A FR']:N+ STR"MhyK'AJEN.
CT 1
Labels: J'A: V'IL "]:KkA F']: STtR"M#Kk`AJE0N .
FR 2500 #J >pm #J >w jag 0.156 sec
FR 3916 $'A: >pm $'A: 0.245 sec
FR 5276 $G >pm $G 0.330 sec
FR 5276 >pm $g+ 0.330 sec
FR 5276 #V >pm #V >w vill 0.330 sec
FR 5919 $'I >pm $'I 0.370 sec
FR 6752 $L >pm $L+ 0.422 sec
FR 7218 #"]: >pm #"]: >w }ka 0.451 sec
Problems
38
● Frames inconsistently labeled
● “Empty” (zero duration) frames used to mark unrealised segments
● (At least) two schools of thought regarding (generated) phoneme sequences
● Extensive copy-and-edit approach to annotation files (metadata often wrong)
Results
39
“terrorstämplade” /tærɔ<pa>stɛempladə/
2019-04-09 40
“Turkiets antiterr- så kallade antiterrorlagstiftning”
/tɵrkiːts<v> antɪt<hes> <pa> soː kalad antɪtærʊrlɑːɡstɪfnɪ/
2019-04-09 41
ASR error: IX debatt: /iː seks debat/
2019-04-09
ASR error: tack för talman: /tak fœ
̞ ː tɑːlman/
2019-04-09
It’s not perfect
44
Transcript: Herr talman! EU-samarbetet gör
Sverige starkare och säkrare. Hot som
klimatkrisen, pandemier, terrorism och
organiserad brottslighet kan inte lösas av ett
enskilt land.
KB: är talman eusamarbetet gör sverige starkare
och säkrare hotsom klimatkrisen pandemier
terrorism och organiserad brottslighet kan inte
lösas av ett enskilt land
Phone: hæː tɑː man eːʉːsamabeːtət jœ
̞ ːr
sværjə starkarə oː sɛːkrarə huːtsɔm
klɪmɑːtkriːsəm pandemiːər <pa> tærʊrɪsm oː
ɔrɡanɪseːrad brɔtslɪheːt kan ɪntə løːsas ɑːv et
eːnʂɪlt land
Pauses and hesitations
45
Transcript: Jag kan bara konstatera att ungefär 200 personer i
veckan nekas inträde i Sverige för att de inte har rätt att komma
hit, och det upptäcks tack vare de inre gränskontrollerna. Jag
kan också konstatera att Säpo gör bedömningen att
terrorhotnivån mot Sverige ligger kvar på en trea, vilket är en
ganska hög nivå som motiverar ökad säkerhet och inre
gränskontroll.
KB: vi gör kan bara konstatera att ungefär tvåhundra personer i
veckan som nekas inträde i sverige tack vare och det upptäcks
via de inre gränskontrollerna för att de inte har rätt att komma till
sverige kan också konstatera att säpo gör bedömningen att
terrorhotsnivån mot sverige ligger kvar på en trea vilket är en
ganska hög nivå vilket också motiverar ökad säkerhet och även
inre gränskontroll
Phone: vɪiːjœ
̞ ːr <pa> <hes> kam bɑːa kɔnstateːra at ɵŋefæː ʈvoː
hɵndra pæʂuːnər <pa> iː vekan sɔm neːkas ɪntrɛːdə iː sværjə
<pa> <hes> tak vɑːrə oː deː ɵptɛeks viːa dɔm ɪnrə
ɡreɛnskɔntrɔləɳa <pa> <hes> fœ
̞ ːra tɔm ɪnt ɑː ret at kɔma tɪ
sværjə <pa> <hes> kan ɔksɔ kɔnstateːra at sɛːpuː jœ
̞ ː
bedœmnɪŋn at tærɔrhʊtsnɪvoːn mʊt sværjə lɪɡə kvɑːr poː poː en
treːa <pa> vɪkət æːr eŋ ɡanska høːɡ<v> nɪvoː <pa> vɪkət ɔksɔ
mʊtɪveːrar <hes> øːkad sɛːkərheːt oː ɛːvən ɪndrəe ɡrɛnskɔntrɔl
Alternate pronunciation
46
Transcript: På den andra sidan har israeliska
ungdomars liv präglats av rädsla och oro för
terrorattentat. I båda länderna ökar
uppgivenheten och radikaliseringen.
KB: på den andra sidan har israeliska
ungdomars liv präglats av rädsla och oro för
terrorattentat i båda länderna ökar
uppgivenheten och radikaliseringen
Phone: poː den andra siːdan oː ɪsraeːlɪska
ɵŋdʊmaʂ liːv prɛːɡlas ɑːv rɛːdsla oː uːrʊ fœ
̞ ː
tærɔr atəntɑːt iː boːda lendæɳa øːkar
ɵpjiːvənheːtən oː radɪkalɪseːrɪŋən
Ongoing work
47
● Forced alignment
○ Older, HMM-style models are better at forced alignment
○ Shorter stride (10ms vs 20ms)
○ Dictionary-based
● Acoustically-validated pronunciation dictionary
○ Intersection of dictionary-derived pronunciations and phonemic transcription
○ Adding rule-based alternatives: “rs” can be /ʂ/ or /rs/
○ Dialect-specific lexica (Riksdag speakers are mostly well known)
Wiktionary validations (top 10)
48
Instances Word Pronunciation Narrow/broad
1161018 att at broad
746256 i iː broad
582874 det deː broad
537306 som sɔm broad
512887 på poː broad
507377 vi viː broad
373091 så soː broad
305332 av ɑːv broad
291260 om ɔm broad
211505 man man broad
Questions?
49

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Robust Phoneme Recognition.pptx

  • 1. Language Variation in Parliamentary Speeches First Steps Towards Robust Phoneme Recognition
  • 2. Why? 2 ● General speech science goals: how do people speak? ● Path towards better training materials, iterative training processes ● SweTerror: multidisciplinary investigation into parliamentary discourse around the topic of terrorism
  • 3. How much data? 2019-04-09 September 2012 to January 2022: 5925 hours of raw video
  • 4. Where does “terror” occur? 2019-04-09 Transcript files 826 Video files (ASR) 1018 Words (transcripts) 6741 Words (ASR) 7227
  • 5. Top 10: transcripts 2019-04-09 terrorism 1605 terrorister 573 terrorismen 294 terror 273 terrordåd 211 terroristbrott 156 terrorhot 138 terrorattentat 134 terrorbrott 131 terrororganisationer 127
  • 6. Top 10: ASR 2019-04-09 terrorism 1608 terrorister 524 terrorismen 353 terror 291 terrordåd 198 terroristbrott 166 terrorbrott 144 terrorhot 135 terrorattentat 133 terrororganisationer 120
  • 8. Files cut off: [Prövning] av förslag till 2019-04-09
  • 9. Normalisation: enligt 11 kap. 3 § 2019-04-09
  • 10. ASR error: IX debatt 2019-04-09
  • 11. ASR error: tack för talman 2019-04-09
  • 13. Normalisation 2019-04-09 Bakhturina, E., Zhang, Y., Ginsburg, B. (2022) Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization. Proc. Interspeech 2022, 491-495, doi: 10.21437/Interspeech.2022-11074 https://github.com/NVIDIA/NeMo-text-processing
  • 15. Can we trust the transcripts? 2019-04-09
  • 16. ASR model: 2019-04-09 Malmsten, M., Haffenden, C., & Börjeson, L. (2022). Hearing voices at the National Library -- a speech corpus and acoustic model for the Swedish language. http://arxiv.org/abs/2205.03026 https://huggingface.co/KBLab/wav2vec2-large-voxrex-swedish
  • 17. Some things were never actually said 2019-04-09 2442203180006309721 1 230.46 0.06 Jag 1.0 Jag cor 2442203180006309721 1 230.6 0.08 har 1.0 har cor 2442203180006309721 1 230.76 0.2 flera 1.0 flera cor 2442203180006309721 1 231.04 0.3 kollegor 1.0 kollegor ins 2442203180006309721 1 231.4 0.159 här 1.0 <eps> sub 2442203180006309721 1 231.76 0.02 i 1.0 i cor 2442203180006309721 1 232.08 0.399 kammaren 1.0 kammaren cor 2442203180006309721 1 232.479 0.0 <eps> 1.0 som del 2442203180006309721 1 232.479 0.0 <eps> 1.0 inte del 2442203180006309721 1 232.479 0.0 <eps> 1.0 kommer del 2442203180006309721 1 232.479 0.0 <eps> 1.0 från del 2442203180006309721 1 232.479 0.0 <eps> 1.0 Stockholm. del
  • 18. Phrases move 2019-04-09 2442203180006309721 1 1596.22 0.099 ett 1.0 ett cor 2442203180006309721 1 1596.38 0.199 lopp 1.0 lopp cor 2442203180006309721 1 1596.579 0.0 <eps> 1.0 på del 2442203180006309721 1 1596.579 0.0 <eps> 1.0 60 del 2442203180006309721 1 1596.579 0.0 <eps> 1.0 mil del 2442203180006309721 1 1596.72 0.16 över 1.0 över cor 2442203180006309721 1 1597.0 0.119 tre 1.0 tre cor 2442203180006309721 1 1597.32 0.279 dagar 1.0 <eps> ins 2442203180006309721 1 1597.7 0.059 på 1.0 <eps> ins 2442203180006309721 1 1597.9 0.379 sextio 1.0 <eps> ins 2442203180006309721 1 1598.36 0.32 mil 1.0 dagar. sub
  • 19. Things are added in the moment 2019-04-09 2442203180006309721 1 2324.96 0.039 vi 1.0 vi cor 2442203180006309721 1 2325.1 0.32 måste 1.0 <eps> ins 2442203180006309721 1 2325.52 0.159 höra 1.0 <eps> ins 2442203180006309721 1 2325.76 0.319 talas 1.0 <eps> ins 2442203180006309721 1 2326.12 0.039 om 1.0 <eps> ins 2442203180006309721 1 2326.22 0.08 den 1.0 <eps> ins 2442203180006309721 1 2326.34 0.099 här 1.0 <eps> ins 2442203180006309721 1 2326.48 0.5 historien 1.0 <eps> ins 2442203180006309721 1 2327.14 0.34 gång 1.0 gång cor 2442203180006309721 1 2327.58 0.039 på 1.0 på cor
  • 20. What can we find? 2019-04-09
  • 21. False starts: 2019-04-09 21 2442207180019978121 1 1619.5 1.579 oppooppositionens 1.0 oppositionens sub 2442207150019764521 1 213.3 0.839 bberor 1.0 beror sub 2442207160019915621 1 492.7 2.0 globalisglobaliseringen 1.0 globaliseringen sub
  • 22. Alternative pronunciations: 2019-04-09 22 2442207160019915621 1 398.3 0.459 resvasion 1.0 reservation sub 2442207160019915621 1 432.52 0.48 resovationen 1.0 reservationen sub
  • 23. Filled pauses: 2019-04-09 23 2442207150019764521 1 622.44 0.159 ifrån 1.0 från sub 2442207180020109821 1 326.1 0.099 nhär 1.0 här ins
  • 24. Wait? Didn’t OpenAI Whisper solve ASR? 2019-04-09 24
  • 25. Untrustworthy for our purposes 2019-04-09 25 Radford, A., Kim, J.W., Xu, T., Brockman, G., Mcleavey, C. & Sutskever, I.. (2023). Robust Speech Recognition via Large-Scale Weak Supervision, in Proceedings of Machine Learning Research 202:28492-28518 Available from https://proceedings.mlr.press/v202/radford23a.html.
  • 26. Untrustworthy for our purposes 2019-04-09 26 (We can safely assume that this includes Riksdag’s data)
  • 27. Disappearing “tack” 2019-04-09 27 2442207060018256921 1 13.96 0.199 tack 1.0 <eps> ins 2442207060018256921 1 14.22 0.059 Herr 1.0 Herr cor 2442207060018256921 1 14.36 0.38 talman! 1.0 talman! cor (The official transcripts only start with “Talman!” “Herr talman!” or “Fru talman!”)
  • 28. Curious insertions 2019-04-09 28 00:00.000 --> 00:30.000 Tack till mina supporters via www.patreon.com 00:30.000 --> 00:34.000 Tack till mina supporters via www.patreon.com 01:00.000 --> 01:04.000 Tack till mina supporters via www.patreon.com 01:30.000 --> 01:34.000 Tack till mina supporters via www.patreon.com 02:00.000 --> 02:04.000 Tack till mina supporters via www.patreon.com 02:30.000 --> 02:34.000 Tack till mina supporters via www.patreon.com 03:00.000 --> 03:04.000 Tack till mina supporters via www.patreon.com 03:30.000 --> 03:34.000 Tack till mina supporters via www.patreon.com
  • 34. Waxholm 34 A dialogue system that gave information on shipping in the Stockholm archipelago Incorporating text-to-speech, ASR, face synthesis, and dialog management However: in the earliest versions, ASR was unavailable, so a Wizard of Oz setup was used. The data from these sessions was transcribed at the word and phoneme level, including non-speech events.
  • 37. Original data 37 CORRECTED: OK jesper Jesper Hogberg Thu Jun 22 13:46:18 EET 1995 AUTOLABEL: jesper Jesper H|gberg Fri Nov 26 09:25:05 MET 1993 Waxholm dialog. /u/wax/data/scenes/fp2008/fp2008.4.04.smp WIZARD: joakim_g Joakim Gustafson Wed Nov 24 10:47:29 MET 1993 TEXT: jag vill }ka fr}n str|mkajen . PHONEME: J'A:G+ V'IL+ "]:K'A FR']:N+ STR"MhyK'AJEN. CT 1 Labels: J'A: V'IL "]:KkA F']: STtR"M#Kk`AJE0N . FR 2500 #J >pm #J >w jag 0.156 sec FR 3916 $'A: >pm $'A: 0.245 sec FR 5276 $G >pm $G 0.330 sec FR 5276 >pm $g+ 0.330 sec FR 5276 #V >pm #V >w vill 0.330 sec FR 5919 $'I >pm $'I 0.370 sec FR 6752 $L >pm $L+ 0.422 sec FR 7218 #"]: >pm #"]: >w }ka 0.451 sec
  • 38. Problems 38 ● Frames inconsistently labeled ● “Empty” (zero duration) frames used to mark unrealised segments ● (At least) two schools of thought regarding (generated) phoneme sequences ● Extensive copy-and-edit approach to annotation files (metadata often wrong)
  • 41. “Turkiets antiterr- så kallade antiterrorlagstiftning” /tɵrkiːts<v> antɪt<hes> <pa> soː kalad antɪtærʊrlɑːɡstɪfnɪ/ 2019-04-09 41
  • 42. ASR error: IX debatt: /iː seks debat/ 2019-04-09
  • 43. ASR error: tack för talman: /tak fœ ̞ ː tɑːlman/ 2019-04-09
  • 44. It’s not perfect 44 Transcript: Herr talman! EU-samarbetet gör Sverige starkare och säkrare. Hot som klimatkrisen, pandemier, terrorism och organiserad brottslighet kan inte lösas av ett enskilt land. KB: är talman eusamarbetet gör sverige starkare och säkrare hotsom klimatkrisen pandemier terrorism och organiserad brottslighet kan inte lösas av ett enskilt land Phone: hæː tɑː man eːʉːsamabeːtət jœ ̞ ːr sværjə starkarə oː sɛːkrarə huːtsɔm klɪmɑːtkriːsəm pandemiːər <pa> tærʊrɪsm oː ɔrɡanɪseːrad brɔtslɪheːt kan ɪntə løːsas ɑːv et eːnʂɪlt land
  • 45. Pauses and hesitations 45 Transcript: Jag kan bara konstatera att ungefär 200 personer i veckan nekas inträde i Sverige för att de inte har rätt att komma hit, och det upptäcks tack vare de inre gränskontrollerna. Jag kan också konstatera att Säpo gör bedömningen att terrorhotnivån mot Sverige ligger kvar på en trea, vilket är en ganska hög nivå som motiverar ökad säkerhet och inre gränskontroll. KB: vi gör kan bara konstatera att ungefär tvåhundra personer i veckan som nekas inträde i sverige tack vare och det upptäcks via de inre gränskontrollerna för att de inte har rätt att komma till sverige kan också konstatera att säpo gör bedömningen att terrorhotsnivån mot sverige ligger kvar på en trea vilket är en ganska hög nivå vilket också motiverar ökad säkerhet och även inre gränskontroll Phone: vɪiːjœ ̞ ːr <pa> <hes> kam bɑːa kɔnstateːra at ɵŋefæː ʈvoː hɵndra pæʂuːnər <pa> iː vekan sɔm neːkas ɪntrɛːdə iː sværjə <pa> <hes> tak vɑːrə oː deː ɵptɛeks viːa dɔm ɪnrə ɡreɛnskɔntrɔləɳa <pa> <hes> fœ ̞ ːra tɔm ɪnt ɑː ret at kɔma tɪ sværjə <pa> <hes> kan ɔksɔ kɔnstateːra at sɛːpuː jœ ̞ ː bedœmnɪŋn at tærɔrhʊtsnɪvoːn mʊt sværjə lɪɡə kvɑːr poː poː en treːa <pa> vɪkət æːr eŋ ɡanska høːɡ<v> nɪvoː <pa> vɪkət ɔksɔ mʊtɪveːrar <hes> øːkad sɛːkərheːt oː ɛːvən ɪndrəe ɡrɛnskɔntrɔl
  • 46. Alternate pronunciation 46 Transcript: På den andra sidan har israeliska ungdomars liv präglats av rädsla och oro för terrorattentat. I båda länderna ökar uppgivenheten och radikaliseringen. KB: på den andra sidan har israeliska ungdomars liv präglats av rädsla och oro för terrorattentat i båda länderna ökar uppgivenheten och radikaliseringen Phone: poː den andra siːdan oː ɪsraeːlɪska ɵŋdʊmaʂ liːv prɛːɡlas ɑːv rɛːdsla oː uːrʊ fœ ̞ ː tærɔr atəntɑːt iː boːda lendæɳa øːkar ɵpjiːvənheːtən oː radɪkalɪseːrɪŋən
  • 47. Ongoing work 47 ● Forced alignment ○ Older, HMM-style models are better at forced alignment ○ Shorter stride (10ms vs 20ms) ○ Dictionary-based ● Acoustically-validated pronunciation dictionary ○ Intersection of dictionary-derived pronunciations and phonemic transcription ○ Adding rule-based alternatives: “rs” can be /ʂ/ or /rs/ ○ Dialect-specific lexica (Riksdag speakers are mostly well known)
  • 48. Wiktionary validations (top 10) 48 Instances Word Pronunciation Narrow/broad 1161018 att at broad 746256 i iː broad 582874 det deː broad 537306 som sɔm broad 512887 på poː broad 507377 vi viː broad 373091 så soː broad 305332 av ɑːv broad 291260 om ɔm broad 211505 man man broad