SlideShare a Scribd company logo
1 of 5
Download to read offline
1
Tree-constrained Pointer Generator with Graph
Neural Network Encodings for Contextual Speech
Recognition
[G. Sun, C. Zhang, P. C. Woodland at Interspeech2022]
@emonosuke
背景
● Contextual biasing:
○ 発話者 (ユーザ) ごとのコンテキストの知識を音声認識へ反映
■ 連絡先, 好きな音楽, プレゼン資料
○ 知識を単語リスト (biasing list) として作成 → 単語リストにある単語を認識しやすくする
2
プレゼン資料 単語リスト
- Interspeech
- 二見
- ソニー
…
End-to-End 音声認識 (AED)
+
認識結果
“えーと Interspeech について二見が発表します”
提案モデル
3
● Tree-constrained Pointer Generator (TCPGen) へ GNN encoding を適用することを提案
TCPGen
②
①
音声
認識
(AED)
P^gen*P^ptr + (1 - P^gen)*P_mdl
③
単語リストのサブワード prefix tree を用意
①以前のサブワードから次に有効な単語を
prefix tree から絞り込む
②P^ptr を計算 有効な単語以外 Mask
③P^gen を計算
Interpolation での P^ptr の重みを決定
提案モデル
4
● Tree-constrained Pointer Generator (TCPGen) へ GNN encoding を適用することを提案
TCPGen
②
①
音声
認識
(AED)
P^gen*P^ptr + (1 - P^gen)*P_mdl
③
①以前のサブワードから次に有効な単語を
prefix tree から絞り込む
②P^ptr を計算 有効な単語以外 Mask
GNN encoding
4
③P^gen を計算
✓ 現在のサブワードだけでなく , 後に続く単語全
体を考慮した表現より , P^gen を正確に推定
評価実験
5
● Librispeech と AMI コーパスで実験 (英語)
○ 単語リスト (biasing list) の作成方法:
Librispeech: 各発話の正解書き起こしから低頻度語 (Rare word) を抽出 + 1000 distractors
AMI: 各ミーティングのスライドに OCR を適用し低頻度語を抽出 (本研究で提案)
R-WER: Rare word error rate
Librispeech
AMI
✓ Librispeech, AMI ともに GNN encoding によって
TCPGen を上回る WER/R-WER 改善.
✓ AED だけでなく RNN-T でも効果あり.
✓ P^gen を見ると (右図) 早い段階で重み付けが実現

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

INTERSPEECH2022yomi.pdf

  • 1. 1 Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition [G. Sun, C. Zhang, P. C. Woodland at Interspeech2022] @emonosuke
  • 2. 背景 ● Contextual biasing: ○ 発話者 (ユーザ) ごとのコンテキストの知識を音声認識へ反映 ■ 連絡先, 好きな音楽, プレゼン資料 ○ 知識を単語リスト (biasing list) として作成 → 単語リストにある単語を認識しやすくする 2 プレゼン資料 単語リスト - Interspeech - 二見 - ソニー … End-to-End 音声認識 (AED) + 認識結果 “えーと Interspeech について二見が発表します”
  • 3. 提案モデル 3 ● Tree-constrained Pointer Generator (TCPGen) へ GNN encoding を適用することを提案 TCPGen ② ① 音声 認識 (AED) P^gen*P^ptr + (1 - P^gen)*P_mdl ③ 単語リストのサブワード prefix tree を用意 ①以前のサブワードから次に有効な単語を prefix tree から絞り込む ②P^ptr を計算 有効な単語以外 Mask ③P^gen を計算 Interpolation での P^ptr の重みを決定
  • 4. 提案モデル 4 ● Tree-constrained Pointer Generator (TCPGen) へ GNN encoding を適用することを提案 TCPGen ② ① 音声 認識 (AED) P^gen*P^ptr + (1 - P^gen)*P_mdl ③ ①以前のサブワードから次に有効な単語を prefix tree から絞り込む ②P^ptr を計算 有効な単語以外 Mask GNN encoding 4 ③P^gen を計算 ✓ 現在のサブワードだけでなく , 後に続く単語全 体を考慮した表現より , P^gen を正確に推定
  • 5. 評価実験 5 ● Librispeech と AMI コーパスで実験 (英語) ○ 単語リスト (biasing list) の作成方法: Librispeech: 各発話の正解書き起こしから低頻度語 (Rare word) を抽出 + 1000 distractors AMI: 各ミーティングのスライドに OCR を適用し低頻度語を抽出 (本研究で提案) R-WER: Rare word error rate Librispeech AMI ✓ Librispeech, AMI ともに GNN encoding によって TCPGen を上回る WER/R-WER 改善. ✓ AED だけでなく RNN-T でも効果あり. ✓ P^gen を見ると (右図) 早い段階で重み付けが実現