Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Transcription Errors in Context of Intent Detection and
Slot Filling
Raphael Schumann
Institute for Computational Linguist...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Spoken Language Understanding Pipeline
Figure: icons: [1]
ASR errors get propagated to NLU component
ASR as black box:
joi...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
High Level Architecture
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 7 / 75
Encoder
bidirectional RNN with LSTM cell
ht = [fht, bht] at each timestep t = {1, ..., Tx }
encodes input sequence x to ve...
Intent Decoder
text classification on encoded input sequence x
intent attention vector ci weighted sum over all ht
intent l...
Intent Decoder Detail
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 10 / 75
Intent Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 11 / 75
Word Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 12 / 75
Word Decoder
language model
RNN with LSTM cell
initial state is set to st
0
Raphael Schumann Transcription Errors in Conte...
Word Decoder
Input at each decoding timestep i:
predicted intent label yi
attention vector cw
i weighted sum over all ht
p...
Word Decoder Detail
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 15 / 75
Word Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 16 / 75
Word Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 17 / 75
Word Decoder
learns distribution over possible ASR errors
output sequence is sampled during training
Raphael Schumann Tran...
Word Decoder
encode new word sequence into hidden states h and sw
0
encoders share weights
Raphael Schumann Transcription ...
Word Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 20 / 75
Slot Decoder
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 21 / 75
Slot Decoder
Input at each decoding timestep i:
predicted intent label yi
attention vector cs
i weighted sum over all ht
p...
Slot Decoder Detail
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 23 / 75
Full Model
language model conditioned on predicted intent
shared word embeddings across model
weights shared between both ...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
ATIS
Airline Travel Information Systems (ATIS) dataset [5]
18 different intent labels
128 different slot labels
Raphael Schu...
ATIS Instance
Input:
words show me flights from boston to new york
Labels:
intent flight
slots O O O O
B-fromloc
.city name
...
Hypotheses Extended ATIS
create ASR hypothesis from audio
add noise to reach ASR performance of ∼ 14% word error rate
use ...
Hypotheses Extended ATIS Instance
Input:
words show flights from boston to no work
Labels:
intent flight
words show me flight...
Data
train dev test unique words
ATIS 4085 893 893 950
extended 11841 2583 2606 3178
Raphael Schumann Transcription Errors...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Baseline 1
Figure: Intent Detection + Slot Filling [6] trained on gold transcription only
Raphael Schumann Transcription E...
Baseline 2
subsequent models:
Figure: Language Model
Figure: Intent Detection + Slot Filling [6]
Raphael Schumann Transcri...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Evaluation Metrics
WER: word error rate
Slot F1: F1-score following CoNLL Chunking Shared Task [7] using
the in/out/begin ...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Results
Models WER (%) Slot (F1) Intent Error (%)
Joint Slot&Detection 14.55 84.26 5.80
Language Model +
Joint Slot&Detect...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
End-to-End Architectures
Figure: Direct model
P(S|X)
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 40 /...
End-to-End Architectures
Figure: Joint model
P(S, W|X) = P(S|W, X)P(W|X)
Raphael Schumann Transcription Errors in Context ...
End-to-End Architectures
Figure: Multitask model
P(S, W|X) = P(S|X)P(W|X)
Raphael Schumann Transcription Errors in Context...
End-to-End Architectures
Figure: Multistage model
P(S, W|X) = P(S|W)P(W|X)
Raphael Schumann Transcription Errors in Contex...
End-to-End Architectures
Figure: Multistage (Argmax) model
Figure: Multistage (SampledSoftmax) model
Raphael Schumann Tran...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Data
human transcribed Google Home queries
24M train
16K test
5 domains (MEDIA, MEDIA CONTROL, PRODUCTIVITY,
DELIGHT, NONE...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Evaluation Metric
F1 for Domain
F1 for Intent
WER for Arguments
Transcript Serialized Semantics
”can you set an alarm for ...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Results
Model Domain F1 Intent F1 Arg WER
Baseline 96.6 95.1 15.04
Direct 96.2 94.2 18.22
Joint 96.8 95.7 14.93
Multitask ...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Joint Transcript Error Correction and NLU
Figure: Mix of joint and multistage model
P(I, S, W|X) = P(S|WI)P(W|IX)P(I|X)
om...
Joint Transcript Error Correction and NLU
Figure: Multistage model
P(S, W|X) = P(S|W)P(W|X)
Raphael Schumann Transcription...
Similarity
Figure: End-to-End SLU
Figure: LM + NLU
P(S, W|X) = P(S|W)P(W|X)
Raphael Schumann Transcription Errors in Conte...
Similarity
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 56 / 75
Results
Model Intent F1 Arg WER
Baseline 95.1 15.04
Multistage (ArgMax) 95.4 14.84
Multistage (SampledSoftmax) 95.2 12.29
...
Results
Figure: Joint Transcript Error Correction and NLU
Figure: End-to-End SLU
Word Decoder of the first model learns a d...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Conclusion
Figure: icons: [1]
bridging the gap between ASR and NLU for
black-box ASR
End-to-End SLU
Raphael Schumann Trans...
Conclusion
Figure: icons: [1]
important to train NLU with sampled transcriptions
learn distribution over transcriptions of...
Outline
1 Introduction
2 Transcription Error and NLU (Schumann and Angkititrakul, 2018)
Model
Data
Baseline
Evaluation Met...
Differentiable Sampling
Figure: Goyal et al., 2017 [10][11][12]
Raphael Schumann Transcription Errors in Context of NLU 15....
Differentiable Sampling
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 64 / 75
Beam Search
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 65 / 75
Beam Search
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 66 / 75
Combined Beam Search 1
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
Combined Beam Search 1
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
Combined Beam Search 1
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
Combined Beam Search 1
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 68 / 75
Combined Beam Search 2
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 69 / 75
Combined Beam Search 2
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 70 / 75
Thank You!
Raphael Schumann Transcription Errors in Context of NLU 15.11.18 71 / 75
Bibliography I
[1] M. Aguilar, A. Shirazi, and S. Keating, Voice, voice, write,
[2] R. Schumann and P. Angkititrakul, “Inc...
Bibliography II
[5] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The atis
spoken language systems pilot corpus,” ...
Bibliography III
[7] E. F. Tjong Kim Sang and S. Buchholz, “Introduction to the
conll-2000 shared task: Chunking,” in Proc...
Bibliography IV
[10] K. Goyal, C. Dyer, and T. Berg-Kirkpatrick, “Differentiable
scheduled sampling for credit assignment,”...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1

Share

Transcription Errors in Context of Intent Detection and Slot Filling

Download to read offline

Reducing impact of ASR transcription errors during intent detection and slot filling by robust training of the NLU component.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Transcription Errors in Context of Intent Detection and Slot Filling

  1. 1. Transcription Errors in Context of Intent Detection and Slot Filling Raphael Schumann Institute for Computational Linguistics Heidelberg University rschuman@cl.uni-heidelberg.de 15.11.18 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 1 / 75
  2. 2. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 2 / 75
  3. 3. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 3 / 75
  4. 4. Spoken Language Understanding Pipeline Figure: icons: [1] ASR errors get propagated to NLU component ASR as black box: jointly train in domain language model and robust NLU train ASR from scratch: End-to-End SLU model Raphael Schumann Transcription Errors in Context of NLU 15.11.18 4 / 75
  5. 5. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 5 / 75
  6. 6. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 6 / 75
  7. 7. High Level Architecture Raphael Schumann Transcription Errors in Context of NLU 15.11.18 7 / 75
  8. 8. Encoder bidirectional RNN with LSTM cell ht = [fht, bht] at each timestep t = {1, ..., Tx } encodes input sequence x to vector st 0 [3]: s0 = tanh(Ws[fhTx , bh1]) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 8 / 75
  9. 9. Intent Decoder text classification on encoded input sequence x intent attention vector ci weighted sum over all ht intent label yi predicted by feed-forward network on [ci , st 0] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 9 / 75
  10. 10. Intent Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 10 / 75
  11. 11. Intent Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 11 / 75
  12. 12. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 12 / 75
  13. 13. Word Decoder language model RNN with LSTM cell initial state is set to st 0 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 13 / 75
  14. 14. Word Decoder Input at each decoding timestep i: predicted intent label yi attention vector cw i weighted sum over all ht previous emitted corrected word yw i−1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 14 / 75
  15. 15. Word Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 15 / 75
  16. 16. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 16 / 75
  17. 17. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 17 / 75
  18. 18. Word Decoder learns distribution over possible ASR errors output sequence is sampled during training Raphael Schumann Transcription Errors in Context of NLU 15.11.18 18 / 75
  19. 19. Word Decoder encode new word sequence into hidden states h and sw 0 encoders share weights Raphael Schumann Transcription Errors in Context of NLU 15.11.18 19 / 75
  20. 20. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 20 / 75
  21. 21. Slot Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 21 / 75
  22. 22. Slot Decoder Input at each decoding timestep i: predicted intent label yi attention vector cs i weighted sum over all ht previous emitted slot token ys i−1 corrected word encoder hidden state hi Raphael Schumann Transcription Errors in Context of NLU 15.11.18 22 / 75
  23. 23. Slot Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 23 / 75
  24. 24. Full Model language model conditioned on predicted intent shared word embeddings across model weights shared between both encoders sample [4] output of LM to second encoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 24 / 75
  25. 25. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 25 / 75
  26. 26. ATIS Airline Travel Information Systems (ATIS) dataset [5] 18 different intent labels 128 different slot labels Raphael Schumann Transcription Errors in Context of NLU 15.11.18 26 / 75
  27. 27. ATIS Instance Input: words show me flights from boston to new york Labels: intent flight slots O O O O B-fromloc .city name O B-toloc .city name I-toloc .city name Raphael Schumann Transcription Errors in Context of NLU 15.11.18 27 / 75
  28. 28. Hypotheses Extended ATIS create ASR hypothesis from audio add noise to reach ASR performance of ∼ 14% word error rate use top-3 hypotheses to form new instances Raphael Schumann Transcription Errors in Context of NLU 15.11.18 28 / 75
  29. 29. Hypotheses Extended ATIS Instance Input: words show flights from boston to no work Labels: intent flight words show me flights from boston to new york slots O O O O B-fromloc .city name O B-toloc .city name I-toloc .city name Raphael Schumann Transcription Errors in Context of NLU 15.11.18 29 / 75
  30. 30. Data train dev test unique words ATIS 4085 893 893 950 extended 11841 2583 2606 3178 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 30 / 75
  31. 31. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 31 / 75
  32. 32. Baseline 1 Figure: Intent Detection + Slot Filling [6] trained on gold transcription only Raphael Schumann Transcription Errors in Context of NLU 15.11.18 32 / 75
  33. 33. Baseline 2 subsequent models: Figure: Language Model Figure: Intent Detection + Slot Filling [6] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 33 / 75
  34. 34. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 34 / 75
  35. 35. Evaluation Metrics WER: word error rate Slot F1: F1-score following CoNLL Chunking Shared Task [7] using the in/out/begin schema [8] Intent Error: percentage of incorrect intent labels Raphael Schumann Transcription Errors in Context of NLU 15.11.18 35 / 75
  36. 36. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 36 / 75
  37. 37. Results Models WER (%) Slot (F1) Intent Error (%) Joint Slot&Detection 14.55 84.26 5.80 Language Model + Joint Slot&Detection 10.43 86.85 5.20 Joint Model 10.55 87.13 5.04 Table: Experimental results on the hypotheses extended ATIS dataset. average of 10 runs Raphael Schumann Transcription Errors in Context of NLU 15.11.18 37 / 75
  38. 38. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 38 / 75
  39. 39. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 39 / 75
  40. 40. End-to-End Architectures Figure: Direct model P(S|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 40 / 75
  41. 41. End-to-End Architectures Figure: Joint model P(S, W|X) = P(S|W, X)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 41 / 75
  42. 42. End-to-End Architectures Figure: Multitask model P(S, W|X) = P(S|X)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 42 / 75
  43. 43. End-to-End Architectures Figure: Multistage model P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 43 / 75
  44. 44. End-to-End Architectures Figure: Multistage (Argmax) model Figure: Multistage (SampledSoftmax) model Raphael Schumann Transcription Errors in Context of NLU 15.11.18 44 / 75
  45. 45. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 45 / 75
  46. 46. Data human transcribed Google Home queries 24M train 16K test 5 domains (MEDIA, MEDIA CONTROL, PRODUCTIVITY, DELIGHT, NONE) 20 intents (SET ALARM, SELF NOTE, ...) 2 arguments (DATETIME, SUBJECT) Transcript Serialized Semantics ”can you set an alarm for 2 p.m.” <DOMAIN><PRODUCTIVITY><INTENT><SET ALARM><DATETIME>2 p.m. ”remind me to buy milk” <DOMAIN><PRODUCTIVITY><INTENT><ADD REMINDER><SUBJECT>buy m ”next song please” <DOMAIN><MEDIA CONTROL> ”how old is barack obama” <DOMAIN><NONE> Raphael Schumann Transcription Errors in Context of NLU 15.11.18 46 / 75
  47. 47. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 47 / 75
  48. 48. Evaluation Metric F1 for Domain F1 for Intent WER for Arguments Transcript Serialized Semantics ”can you set an alarm for 2 p.m.” <DOMAIN><PRODUCTIVITY><INTENT><SET ALARM><DATETIME>2 p.m. ”remind me to buy milk” <DOMAIN><PRODUCTIVITY><INTENT><ADD REMINDER><SUBJECT>buy m ”next song please” <DOMAIN><MEDIA CONTROL> ”how old is barack obama” <DOMAIN><NONE> Raphael Schumann Transcription Errors in Context of NLU 15.11.18 48 / 75
  49. 49. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 49 / 75
  50. 50. Results Model Domain F1 Intent F1 Arg WER Baseline 96.6 95.1 15.04 Direct 96.2 94.2 18.22 Joint 96.8 95.7 14.93 Multitask 96.7 95.8 15.02 Multistage (ArgMax) 96.5 95.4 14.84 Multistage (SampledSoftmax) 96.5 95.2 12.29 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 50 / 75
  51. 51. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 51 / 75
  52. 52. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 52 / 75
  53. 53. Joint Transcript Error Correction and NLU Figure: Mix of joint and multistage model P(I, S, W|X) = P(S|WI)P(W|IX)P(I|X) omit intent decoder and pretend its combined with slot decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 53 / 75
  54. 54. Joint Transcript Error Correction and NLU Figure: Multistage model P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 54 / 75
  55. 55. Similarity Figure: End-to-End SLU Figure: LM + NLU P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 55 / 75
  56. 56. Similarity Raphael Schumann Transcription Errors in Context of NLU 15.11.18 56 / 75
  57. 57. Results Model Intent F1 Arg WER Baseline 95.1 15.04 Multistage (ArgMax) 95.4 14.84 Multistage (SampledSoftmax) 95.2 12.29 Table: End-to-End SLU Model Intent Error Slot F1 Baseline 5.80 84.26 Multistage (ArgMax) 5.20 86.85 Multistage (SampledSoftmax) 5.04 87.13 Table: LM + NLU Raphael Schumann Transcription Errors in Context of NLU 15.11.18 57 / 75
  58. 58. Results Figure: Joint Transcript Error Correction and NLU Figure: End-to-End SLU Word Decoder of the first model learns a distribution over possible errors in the transcriptions Semantic decoder is exposed to a variety of (sampled) incorrect transcriptions Raphael Schumann Transcription Errors in Context of NLU 15.11.18 58 / 75
  59. 59. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 59 / 75
  60. 60. Conclusion Figure: icons: [1] bridging the gap between ASR and NLU for black-box ASR End-to-End SLU Raphael Schumann Transcription Errors in Context of NLU 15.11.18 60 / 75
  61. 61. Conclusion Figure: icons: [1] important to train NLU with sampled transcriptions learn distribution over transcriptions of ”black box” ASR Raphael Schumann Transcription Errors in Context of NLU 15.11.18 61 / 75
  62. 62. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 62 / 75
  63. 63. Differentiable Sampling Figure: Goyal et al., 2017 [10][11][12] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 63 / 75
  64. 64. Differentiable Sampling Raphael Schumann Transcription Errors in Context of NLU 15.11.18 64 / 75
  65. 65. Beam Search Raphael Schumann Transcription Errors in Context of NLU 15.11.18 65 / 75
  66. 66. Beam Search Raphael Schumann Transcription Errors in Context of NLU 15.11.18 66 / 75
  67. 67. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  68. 68. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  69. 69. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  70. 70. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 68 / 75
  71. 71. Combined Beam Search 2 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 69 / 75
  72. 72. Combined Beam Search 2 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 70 / 75
  73. 73. Thank You! Raphael Schumann Transcription Errors in Context of NLU 15.11.18 71 / 75
  74. 74. Bibliography I [1] M. Aguilar, A. Shirazi, and S. Keating, Voice, voice, write, [2] R. Schumann and P. Angkititrakul, “Incorporating asr errors with attention-based, jointly trained rnn for intent detection and slot filling,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 6059–6063. doi: 10.1109/ICASSP.2018.8461598. [3] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” CoRR, vol. abs/1409.0473, 2014. arXiv: 1409.0473. [Online]. Available: http://arxiv.org/abs/1409.0473. [4] S. Bengio, O. Vinyals, N. Jaitly, and N. M. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” in Advances in Neural Information Processing Systems, NIPS, 2015. [Online]. Available: http://arxiv.org/abs/1506.03099. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 72 / 75
  75. 75. Bibliography II [5] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The atis spoken language systems pilot corpus,” in Proceedings of the Workshop on Speech and Natural Language, ser. HLT ’90, Hidden Valley, Pennsylvania: Association for Computational Linguistics, 1990, pp. 96–101. doi: 10.3115/116580.116613. [Online]. Available: https://doi.org/10.3115/116580.116613. [6] B. Liu and I. Lane, “Attention-based recurrent neural network models for joint intent detection and slot filling,” CoRR, vol. abs/1609.01454, 2016. [Online]. Available: http://arxiv.org/abs/1609.01454. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 73 / 75
  76. 76. Bibliography III [7] E. F. Tjong Kim Sang and S. Buchholz, “Introduction to the conll-2000 shared task: Chunking,” in Proceedings of the 2Nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning - Volume 7, ser. ConLL ’00, Lisbon, Portugal: Association for Computational Linguistics, 2000, pp. 127–132. doi: 10.3115/1117601.1117631. [Online]. Available: https://doi.org/10.3115/1117601.1117631. [8] L. A. Ramshaw and M. P. Marcus, “Text chunking using transformation-based learning,” CoRR, vol. cmp-lg/9505040, 1995. [Online]. Available: http://arxiv.org/abs/cmp-lg/9505040. [9] P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur, P. Moreno, R. Prabhavalkar, Z. Qu, and A. Waters, “From audio to semantics: Approaches to end-to-end spoken language understanding,” arXiv preprint arXiv:1809.09190, 2018. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 74 / 75
  77. 77. Bibliography IV [10] K. Goyal, C. Dyer, and T. Berg-Kirkpatrick, “Differentiable scheduled sampling for credit assignment,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada: Association for Computational Linguistics, 2017, pp. 366–371. doi: 10.18653/v1/P17-2058. [Online]. Available: http://aclweb.org/anthology/P17-2058. [11] C. J. Maddison, A. Mnih, and Y. W. Teh, “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,” in International Conference on Learning Representations, 2017. [12] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” , 2017. [Online]. Available: https://arxiv.org/abs/1611.01144. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 75 / 75
  • sky_wu

    May. 19, 2019

Reducing impact of ASR transcription errors during intent detection and slot filling by robust training of the NLU component.

Views

Total views

698

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

6

Shares

0

Comments

0

Likes

1

×