Advertisement
Advertisement

More Related Content

Similar to Transcription Errors in Context of Intent Detection and Slot Filling(20)

Advertisement

Transcription Errors in Context of Intent Detection and Slot Filling

  1. Transcription Errors in Context of Intent Detection and Slot Filling Raphael Schumann Institute for Computational Linguistics Heidelberg University rschuman@cl.uni-heidelberg.de 15.11.18 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 1 / 75
  2. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 2 / 75
  3. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 3 / 75
  4. Spoken Language Understanding Pipeline Figure: icons: [1] ASR errors get propagated to NLU component ASR as black box: jointly train in domain language model and robust NLU train ASR from scratch: End-to-End SLU model Raphael Schumann Transcription Errors in Context of NLU 15.11.18 4 / 75
  5. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 5 / 75
  6. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 6 / 75
  7. High Level Architecture Raphael Schumann Transcription Errors in Context of NLU 15.11.18 7 / 75
  8. Encoder bidirectional RNN with LSTM cell ht = [fht, bht] at each timestep t = {1, ..., Tx } encodes input sequence x to vector st 0 [3]: s0 = tanh(Ws[fhTx , bh1]) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 8 / 75
  9. Intent Decoder text classification on encoded input sequence x intent attention vector ci weighted sum over all ht intent label yi predicted by feed-forward network on [ci , st 0] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 9 / 75
  10. Intent Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 10 / 75
  11. Intent Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 11 / 75
  12. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 12 / 75
  13. Word Decoder language model RNN with LSTM cell initial state is set to st 0 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 13 / 75
  14. Word Decoder Input at each decoding timestep i: predicted intent label yi attention vector cw i weighted sum over all ht previous emitted corrected word yw i−1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 14 / 75
  15. Word Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 15 / 75
  16. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 16 / 75
  17. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 17 / 75
  18. Word Decoder learns distribution over possible ASR errors output sequence is sampled during training Raphael Schumann Transcription Errors in Context of NLU 15.11.18 18 / 75
  19. Word Decoder encode new word sequence into hidden states h and sw 0 encoders share weights Raphael Schumann Transcription Errors in Context of NLU 15.11.18 19 / 75
  20. Word Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 20 / 75
  21. Slot Decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 21 / 75
  22. Slot Decoder Input at each decoding timestep i: predicted intent label yi attention vector cs i weighted sum over all ht previous emitted slot token ys i−1 corrected word encoder hidden state hi Raphael Schumann Transcription Errors in Context of NLU 15.11.18 22 / 75
  23. Slot Decoder Detail Raphael Schumann Transcription Errors in Context of NLU 15.11.18 23 / 75
  24. Full Model language model conditioned on predicted intent shared word embeddings across model weights shared between both encoders sample [4] output of LM to second encoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 24 / 75
  25. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 25 / 75
  26. ATIS Airline Travel Information Systems (ATIS) dataset [5] 18 different intent labels 128 different slot labels Raphael Schumann Transcription Errors in Context of NLU 15.11.18 26 / 75
  27. ATIS Instance Input: words show me flights from boston to new york Labels: intent flight slots O O O O B-fromloc .city name O B-toloc .city name I-toloc .city name Raphael Schumann Transcription Errors in Context of NLU 15.11.18 27 / 75
  28. Hypotheses Extended ATIS create ASR hypothesis from audio add noise to reach ASR performance of ∼ 14% word error rate use top-3 hypotheses to form new instances Raphael Schumann Transcription Errors in Context of NLU 15.11.18 28 / 75
  29. Hypotheses Extended ATIS Instance Input: words show flights from boston to no work Labels: intent flight words show me flights from boston to new york slots O O O O B-fromloc .city name O B-toloc .city name I-toloc .city name Raphael Schumann Transcription Errors in Context of NLU 15.11.18 29 / 75
  30. Data train dev test unique words ATIS 4085 893 893 950 extended 11841 2583 2606 3178 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 30 / 75
  31. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 31 / 75
  32. Baseline 1 Figure: Intent Detection + Slot Filling [6] trained on gold transcription only Raphael Schumann Transcription Errors in Context of NLU 15.11.18 32 / 75
  33. Baseline 2 subsequent models: Figure: Language Model Figure: Intent Detection + Slot Filling [6] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 33 / 75
  34. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 34 / 75
  35. Evaluation Metrics WER: word error rate Slot F1: F1-score following CoNLL Chunking Shared Task [7] using the in/out/begin schema [8] Intent Error: percentage of incorrect intent labels Raphael Schumann Transcription Errors in Context of NLU 15.11.18 35 / 75
  36. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 36 / 75
  37. Results Models WER (%) Slot (F1) Intent Error (%) Joint Slot&Detection 14.55 84.26 5.80 Language Model + Joint Slot&Detection 10.43 86.85 5.20 Joint Model 10.55 87.13 5.04 Table: Experimental results on the hypotheses extended ATIS dataset. average of 10 runs Raphael Schumann Transcription Errors in Context of NLU 15.11.18 37 / 75
  38. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 38 / 75
  39. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 39 / 75
  40. End-to-End Architectures Figure: Direct model P(S|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 40 / 75
  41. End-to-End Architectures Figure: Joint model P(S, W|X) = P(S|W, X)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 41 / 75
  42. End-to-End Architectures Figure: Multitask model P(S, W|X) = P(S|X)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 42 / 75
  43. End-to-End Architectures Figure: Multistage model P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 43 / 75
  44. End-to-End Architectures Figure: Multistage (Argmax) model Figure: Multistage (SampledSoftmax) model Raphael Schumann Transcription Errors in Context of NLU 15.11.18 44 / 75
  45. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 45 / 75
  46. Data human transcribed Google Home queries 24M train 16K test 5 domains (MEDIA, MEDIA CONTROL, PRODUCTIVITY, DELIGHT, NONE) 20 intents (SET ALARM, SELF NOTE, ...) 2 arguments (DATETIME, SUBJECT) Transcript Serialized Semantics ”can you set an alarm for 2 p.m.” <DOMAIN><PRODUCTIVITY><INTENT><SET ALARM><DATETIME>2 p.m. ”remind me to buy milk” <DOMAIN><PRODUCTIVITY><INTENT><ADD REMINDER><SUBJECT>buy m ”next song please” <DOMAIN><MEDIA CONTROL> ”how old is barack obama” <DOMAIN><NONE> Raphael Schumann Transcription Errors in Context of NLU 15.11.18 46 / 75
  47. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 47 / 75
  48. Evaluation Metric F1 for Domain F1 for Intent WER for Arguments Transcript Serialized Semantics ”can you set an alarm for 2 p.m.” <DOMAIN><PRODUCTIVITY><INTENT><SET ALARM><DATETIME>2 p.m. ”remind me to buy milk” <DOMAIN><PRODUCTIVITY><INTENT><ADD REMINDER><SUBJECT>buy m ”next song please” <DOMAIN><MEDIA CONTROL> ”how old is barack obama” <DOMAIN><NONE> Raphael Schumann Transcription Errors in Context of NLU 15.11.18 48 / 75
  49. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 49 / 75
  50. Results Model Domain F1 Intent F1 Arg WER Baseline 96.6 95.1 15.04 Direct 96.2 94.2 18.22 Joint 96.8 95.7 14.93 Multitask 96.7 95.8 15.02 Multistage (ArgMax) 96.5 95.4 14.84 Multistage (SampledSoftmax) 96.5 95.2 12.29 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 50 / 75
  51. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 51 / 75
  52. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 52 / 75
  53. Joint Transcript Error Correction and NLU Figure: Mix of joint and multistage model P(I, S, W|X) = P(S|WI)P(W|IX)P(I|X) omit intent decoder and pretend its combined with slot decoder Raphael Schumann Transcription Errors in Context of NLU 15.11.18 53 / 75
  54. Joint Transcript Error Correction and NLU Figure: Multistage model P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 54 / 75
  55. Similarity Figure: End-to-End SLU Figure: LM + NLU P(S, W|X) = P(S|W)P(W|X) Raphael Schumann Transcription Errors in Context of NLU 15.11.18 55 / 75
  56. Similarity Raphael Schumann Transcription Errors in Context of NLU 15.11.18 56 / 75
  57. Results Model Intent F1 Arg WER Baseline 95.1 15.04 Multistage (ArgMax) 95.4 14.84 Multistage (SampledSoftmax) 95.2 12.29 Table: End-to-End SLU Model Intent Error Slot F1 Baseline 5.80 84.26 Multistage (ArgMax) 5.20 86.85 Multistage (SampledSoftmax) 5.04 87.13 Table: LM + NLU Raphael Schumann Transcription Errors in Context of NLU 15.11.18 57 / 75
  58. Results Figure: Joint Transcript Error Correction and NLU Figure: End-to-End SLU Word Decoder of the first model learns a distribution over possible errors in the transcriptions Semantic decoder is exposed to a variety of (sampled) incorrect transcriptions Raphael Schumann Transcription Errors in Context of NLU 15.11.18 58 / 75
  59. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 59 / 75
  60. Conclusion Figure: icons: [1] bridging the gap between ASR and NLU for black-box ASR End-to-End SLU Raphael Schumann Transcription Errors in Context of NLU 15.11.18 60 / 75
  61. Conclusion Figure: icons: [1] important to train NLU with sampled transcriptions learn distribution over transcriptions of ”black box” ASR Raphael Schumann Transcription Errors in Context of NLU 15.11.18 61 / 75
  62. Outline 1 Introduction 2 Transcription Error and NLU (Schumann and Angkititrakul, 2018) Model Data Baseline Evaluation Metrics Results 3 End-to-End SLU (Haghani et al., 2018) End-to-End Architectures Data Evaluation Metrics Results 4 Compare Results Conclusion Challenges Raphael Schumann Transcription Errors in Context of NLU 15.11.18 62 / 75
  63. Differentiable Sampling Figure: Goyal et al., 2017 [10][11][12] Raphael Schumann Transcription Errors in Context of NLU 15.11.18 63 / 75
  64. Differentiable Sampling Raphael Schumann Transcription Errors in Context of NLU 15.11.18 64 / 75
  65. Beam Search Raphael Schumann Transcription Errors in Context of NLU 15.11.18 65 / 75
  66. Beam Search Raphael Schumann Transcription Errors in Context of NLU 15.11.18 66 / 75
  67. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  68. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  69. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 67 / 75
  70. Combined Beam Search 1 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 68 / 75
  71. Combined Beam Search 2 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 69 / 75
  72. Combined Beam Search 2 Raphael Schumann Transcription Errors in Context of NLU 15.11.18 70 / 75
  73. Thank You! Raphael Schumann Transcription Errors in Context of NLU 15.11.18 71 / 75
  74. Bibliography I [1] M. Aguilar, A. Shirazi, and S. Keating, Voice, voice, write, [2] R. Schumann and P. Angkititrakul, “Incorporating asr errors with attention-based, jointly trained rnn for intent detection and slot filling,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp. 6059–6063. doi: 10.1109/ICASSP.2018.8461598. [3] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” CoRR, vol. abs/1409.0473, 2014. arXiv: 1409.0473. [Online]. Available: http://arxiv.org/abs/1409.0473. [4] S. Bengio, O. Vinyals, N. Jaitly, and N. M. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” in Advances in Neural Information Processing Systems, NIPS, 2015. [Online]. Available: http://arxiv.org/abs/1506.03099. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 72 / 75
  75. Bibliography II [5] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The atis spoken language systems pilot corpus,” in Proceedings of the Workshop on Speech and Natural Language, ser. HLT ’90, Hidden Valley, Pennsylvania: Association for Computational Linguistics, 1990, pp. 96–101. doi: 10.3115/116580.116613. [Online]. Available: https://doi.org/10.3115/116580.116613. [6] B. Liu and I. Lane, “Attention-based recurrent neural network models for joint intent detection and slot filling,” CoRR, vol. abs/1609.01454, 2016. [Online]. Available: http://arxiv.org/abs/1609.01454. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 73 / 75
  76. Bibliography III [7] E. F. Tjong Kim Sang and S. Buchholz, “Introduction to the conll-2000 shared task: Chunking,” in Proceedings of the 2Nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning - Volume 7, ser. ConLL ’00, Lisbon, Portugal: Association for Computational Linguistics, 2000, pp. 127–132. doi: 10.3115/1117601.1117631. [Online]. Available: https://doi.org/10.3115/1117601.1117631. [8] L. A. Ramshaw and M. P. Marcus, “Text chunking using transformation-based learning,” CoRR, vol. cmp-lg/9505040, 1995. [Online]. Available: http://arxiv.org/abs/cmp-lg/9505040. [9] P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur, P. Moreno, R. Prabhavalkar, Z. Qu, and A. Waters, “From audio to semantics: Approaches to end-to-end spoken language understanding,” arXiv preprint arXiv:1809.09190, 2018. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 74 / 75
  77. Bibliography IV [10] K. Goyal, C. Dyer, and T. Berg-Kirkpatrick, “Differentiable scheduled sampling for credit assignment,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada: Association for Computational Linguistics, 2017, pp. 366–371. doi: 10.18653/v1/P17-2058. [Online]. Available: http://aclweb.org/anthology/P17-2058. [11] C. J. Maddison, A. Mnih, and Y. W. Teh, “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables,” in International Conference on Learning Representations, 2017. [12] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” , 2017. [Online]. Available: https://arxiv.org/abs/1611.01144. Raphael Schumann Transcription Errors in Context of NLU 15.11.18 75 / 75
Advertisement