Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics

530 views

Published on

This is a material for invited talk in the workshop on Machine Learning Methods for High-
Level Cognitive Capabilities in Robotics 2016 (ML-HLCR2016) held in IROS2016, Korea.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics

  1. 1. Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics Tadahiro Taniguchi College of Information Science & Engineering Ritsumeikan University Invited talk @ Workshop on Machine Learning Methods for High- Level Cognitive Capabilities in Robotics 2016 ML-HLCR2016, in IROS2016, Daejeon, Korea 13/10/2016 @tanichu
  2. 2. Contents 1. Introduction 2. Word segmentation and discovery 3. Nonparametric Bayesian double articulation analyzer (NPB-DAA) 4. Conclusion Tadahiro Taniguchi, Takayuki Nagai, Tomoaki Nakamura, Naoto Iwahashi, Tetsuya Ogata, and Hideki Asoh Symbol Emergence in Robotics: A Survey Advanced Robotics. (2016)DOI:10.1080/01691864.2016.1164622 @tanichu
  3. 3. Without any pre-existing knowledge of phonemes and vocabularies. (like human infants) [D. Roy 2002] [N. Iwahashi 2003] D. K. Roy and A. P. Pentland, “Learning words from sights and sounds: a computational model,” Cogn. Sci., vol. 26, no. 1, pp. 113–146, 2002. N. Iwahashi, “Language acquisition through a human – robot interface by combining speech , visual , and behavioral information,” vol. 156, pp. 109–121, 2003. Unsupervised machine learning for language acquisition by a robot
  4. 4. Tadahiro Taniguchi, Takayuki Nagai, Tomoaki Nakamura, Naoto Iwahashi, Tetsuya Ogata, and Hideki Asoh, Symbol Emergence in Robotics: A Survey Advanced Robotics, .(2016)DOI:10.1080/01691864.2016.1164622
  5. 5. Symbol emergence system
  6. 6. Word discovery for symbol emergence in robotics • To enable a robot obtain many words provided by its user through human-robot interaction, word discovery is a critical task. … … … can apple ... you this ... ...
  7. 7. Contents 1. Introduction 2. Word segmentation and discovery 3. Nonparametric Bayesian double articulation analyzer (NPB-DAA) 4. Conclusion @tanichu
  8. 8. Word segmentation in language acquisition  When parents speak to their children, they rarely use “isolated words,” but use continuous word sequences, i.e. sentences.  Word segmentation is a primary task of language acquisition.  The child has to perform word segmentation without pre-existing knowledge of vocabulary because children do not know lists of words before they learn. … … …THISISANAPPLE APPLEISSOSWEET HEYLOOKATTHIS OHYOUARESOCUTE ?????
  9. 9. Unsupervised word segmentation  Word segmentation problem – Example: • Thisisanapple(ðɪsɪzənæpl) -> This(ðɪs) is(ɪz) an(ən) apple(æpl) • WATASHIWATANAKANADESU(わたしはたなかです) -> WATASHI(わたし) WA(は) TANAKA(たなか) DESU(です) – To segment sentences into words (morpheme). – This had required preexisting knowledge of language model, i.e., dictionary.  Unsupervised word segmentation – No preexisting dictionaries are used. – A nonparametric Bayesian framework for word segmentation [Goldwater+ 09] – Unsupervised word segmentation method based on the Nested Pitman–Yor language model (NPYLM) [Mochihashi+ 09]. S. Goldwater, T. L. Griffiths, and M. Johnson, “A Bayesian framework for word segmentation: exploring the effects of context.,” Cognition, vol. 112, no. 1, pp. 21–54, 2009. Daichi Mochihashi, Takeshi Yamada, Naonori Ueda."Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling". ACL-IJCNLP 2009, pp.100-108, 2009.
  10. 10. NPYLM [Mochihashi ‘09] (Nested Pitman-Yor Language Model) • Mochihashi et al. proposed NPYLM for unsupervised word segmentation. • NPYLM has a word n-gram model and a letter n-gram model. Each adopts hierarchical Pitman-Yor language model as a language model. • Bayesian nonparametrics. • Efficient blocked Gibbs sampler. Daichi Mochihashi, Takeshi Yamada, Naonori Ueda."Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling". ACL-IJCNLP 2009, pp.100-108, 2009. Language model (Vocabulary) Word segmentation Updating language model
  11. 11. 2016/10/16 11 [From Mochihashi’s presentation slide] http://chasen.org/~daiti-m/paper/jfssa2009segment.pdf Analysis
  12. 12. Problems with Unsupervised Word Segmentation in Word Discovery Tasks • NPYLM presumes that the target document (sentences) is transcribed without errors. – If there are phoneme recognition errors, its performance becomes dramatically worse. – How to mitigate the effect of phoneme recognition errors in word discovery is an important issue in real-world language acquisition. [Saffuran 1996]
  13. 13. Problem and approach Continuous speech signals Single Gaussian emission distribution with duration distribution Word dictionary and bigram language model A single nonparametric Bayesian probabilistic generative model Acoustic model Language model … … … Unsupervised learning
  14. 14. Contents 1. Introduction 2. Word segmentation and discovery 3. Nonparametric Bayesian double articulation analyzer (NPB-DAA) 4. Conclusion 1. Tadahiro Taniguchi, Shogo Nagasaka, Ryo Nakashima Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals IEEE Transactions on Cognitive and Developmental Systems .(2016) 2. Tadahiro Taniguchi, Ryo Nakashima, Hailong Liu and Shogo Nagasaka Double Articulation Analyzer with Deep Sparse Autoencoder for Unsupervised Word Discovery from Speech Signals Advanced Robotics, Vol.30 (11-12) pp. 770-783 .(2016) @tanichu
  15. 15. Double articulation structure in semiotic data • Semiotic time-series data often has double articulation – Speech signal is a continuous and high-dimensional time-series. – Spoken sentence is considered a sequence of phonemes. – Phonemes are grouped into words, and people give them meanings. h a u m ʌ́ tʃ I z ð í s [h a u ] [m ʌ́ tʃ] [ i z ] [ð í s] How much is this?Word Phoneme Speech signal semantic (meaningful) meaningless unsegmented Does the human brain have a special capability to analyze double articulation structures embedded in time-series data?
  16. 16. 1 2 46 1 27 8 5 10 11 13 14 7 W H A T I S T H I S T H I S I S A P E N [WHAT] [IS] [THIS] [THIS] [IS] [A] [PEN] Speech Motion Driving Working hypothesis Double Articulation Structure in Human Behavior 2016/10/16
  17. 17. Double Articulation Analyzer (DAA) and its application to non-speech time series data Double Articulation Analyzer = sticky HDP-HMM + NPYLM  sticky HDP-HMM = nonparametric Bayesian HMM  NPYLM = nonparametric Bayesian language model for unsupervised morphological analysis HDP-HMM [Fox ‘07] NPYLM [Moachihashi ‘09] Tadahiro Taniguchi, Shogo Nagasaka, Double Articulation Analyzer for Unsegmented Human Motion using Pitman-Yor Language model and Infinite Hidden Markov Model, 2011 IEEE/SICE SII.(2011) Human motion Driving behavior Imitation learning Motion segmentation [Taniguchi’11] Extracting driving chunk [Nagasaka ‘12] Detecting intentional changing points [Takenaka ‘12] Prediction [Taniguchi ‘12] Video summarization [Takenaka ‘12] For topic modeling [Bando ‘13]
  18. 18. Simultaneous acquisition of phoneme and language models  [Nakamura+ 2014] used a pre-existing phoneme model and did not make a robot learn a phoneme model.  There are still few studies about unsupervised simultaneous learning of phoneme and language models from speech signals [Kamper+ 15, Lee+ 15].  Does the analysis of double articulation structure embedded in speech signals enable a robot to obtain phoneme and language models simultaneously? … … … Prosodic Cues Distributional Cues Co-occurrence Cues H. Kamper, A. Jansen, and S. Goldwater, “Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model,” in INTERSPEECH 2015, 2015. C.-y. Lee, T. J. O. Donnell, and J. Glass, “Unsupervised Lexicon Discovery from Acoustic Input,” Transactions of the Association for Computational Linguistics, vol. 3, pp. 389-403, 2015. Making full use of the directly from speech signals
  19. 19. Hierarchical Dirichlet process hidden language model (HDP-HLM) [Taniguchi+ 16] 19Tadahiro Taniguchi, Shogo Nagasaka, Ryo Nakashima, Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals, IEEE Transactions on Cognitive and Developmental Systems.(2016) γLM Language model (Word bigram) γWM i=1,…,∞ αWM j=1,…,∞ Word model (Letter bigram) z1 zs-1 zs zs+1 zS Latent words (Super state sequence) wi i=1,…,∞ ls1 lsk lsL Latent letters Ds1 Dsk x1 xt1 s1 xT Acoustic model ωj θj G H yT Observation Ds1 Dsk DsL Duration βLM αLM πLM i βWM πWM j xt 2 s1 xt1 sk xt 2 sk xt 1 sL xt 2 sL j=1,…,∞ yt 2 sLyt 1 sLyt1 sk yt 2 sk yt1 s1 yt 2 s1 y1 DsL zs zszs zs zs zs zs Language model (Word bigram model with letter bigram model) Acoustic model (phoneme model) Word sequence Phoneme sequence A probabilistic generative model for time-series data having double articulation structure
  20. 20. HDP-HLM as an extension of HDP-HSMM  HDP-HLM can be regarded as an extension of HDP- HSMM [Johnson’13]  This property helps us to derive efficient inference procedure. Matthew J Johnson and Alan S Willsky. Bayesian nonparametric hidden semi-markov models. The Journal of Machine Learning Research, Vol. 14, No. 1, pp. 673–701, 2013. HDP-HSMM (hierarchical Dirichlet process hidden semi-Markov model) corresponds..
  21. 21. Inference (Blocked Gibbs sampler)  Blocked Gibbs sampler can be derived by extending HDP- HMM’s backward filtering-forward sampling algorithm. Backward filtering Forward sampling Parameter update very heavy....
  22. 22. Evaluation experiment using artificial 2 or 3 words sentences with Japanese five vowels  Five artificial words {aioi, aue, ao, ie, uo} prepared by connecting five Japanese vowels.  30 sentences (25 two-word and 5 three-word sentences) are prepared and each sentence is recorded twice by four Japanese speakers.  MFCC (frame size =25ms, shift = 10ms, frame rate 100hz) ex. aioi ao γLM Language model (Word bigram) γWM i=1,…,∞ αWM j=1,…,∞ Word model (Letter bigram) z1 zs-1 zs zs+1 zS Latent words (Super state sequence) wi i=1,…,∞ ls1 lsk lsL Latent letters Ds1 Dsk x1 xt1 s1 xT Acoustic model ωj θj G H yT Observation Ds1 Dsk DsL Duration βLM αLM πLM i βWM πWM j xt 2 s1 xt1 sk xt 2 sk xt 1 sL xt 2 sL j=1,…,∞ yt 2 sLyt 1 sLyt1 sk yt 2 sk yt1 s1 yt 2 s1 y1 DsL zs zszs zs zs zs zs * HDP-HLM are trained separately for each speaker.
  23. 23. Sample of results  Compared to Conventional DAA, NPB-DAA could discover latent words accurately.  The inference procedure could gradually estimate the boundaries of words and phonemes. ex) ao-ie-ao
  24. 24. Unsupervised word discovery with trained phoneme recognizer Nonparametric Bayesian Double Articulation Analyzer (NPB-DAA) based on HDP-HLM [Taniguchi ’16]  The method could estimate language and acoustic/phoneme models simultaneously.  The comparative methods were compared using ARI (adjusted rand index) from the viewpoint of frame-based clustering task.  It even outperformed an off-the-shelf speech recognition system-based method in a word discovery task. 24 Tadahiro Taniguchi, Shogo Nagasaka, Ryo Nakashima, Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals, IEEE Transactions on Cognitive and Developmental Systems.(2016) Unsupervised word discovery Speech recognition with off-the-shelf ASR system
  25. 25. Double Articulation Analyzer with Deep Sparse Autoencoder for Unsupervised Word Discovery from Speech Signals  Deep learning-based feature extraction method , deep sparse autoencoder (DSAE), was employed to increase the performance of NPB-DAA.  DSAE can be trained in an unsupervised manner. Therefore, the total learning system is still an unsupervised learning system.
  26. 26. Experimental results The NPB-DAA with DSAE even outperformed MFCC-based off-the-shelf speech recognition system. Tadahiro Taniguchi, Ryo Nakashima, Hailong Liu and Shogo Nagasaka, Double Articulation Analyzer with Deep Sparse Autoencoder for Unsupervised Word Discovery from Speech Signals, Advanced Robotics.(2016)
  27. 27. Contents 1. Introduction  Symbol emergence in robotics 2. Word discovery with multimodal categorization 3. Direct word discovery from speech signals 4. Conclusion @tanichu
  28. 28. Conclusion  Symbol Emergence in Robotics is introduced. SER is a synthetic approach towards developmental mental system involving language acquisition and symbol emergence systems.  An unsupervised machine learning method for word discovery by robots, NPB-DAA, is introduced. This is based on Nonparametric Bayesian approach. … … …
  29. 29. Current problems and future challenges • Current Problems – Computational cost • Analyzing only 60 sentences require more than one hour. – Speaker dependency • Unsupervised learning from multi-speaker speech signals is currently difficult because each speaker’s acoustic feature is different from each other. • Future Challenges – Efficient and Fast Algorithm • Inventing more efficient inference methods and using more computational resources are our future directions. – Unsupervised Speaker Adaptation • Developing a unsupervised speaker adaptation method for language acquisition from multi-speaker speech signals. – Mutual learning of words, phonemes and objects. • It is expected that phoneme acquisition performance is also increased by learning phonemes with objects simultaneously.
  30. 30. Future challenge Word discovery for symbol emergence in robotics … … … NPB-DAA Multimodal object categorization SLAM Motion Primitives Affordance learning Syntax learning Probabilistic information PhonemesPhonemes & words
  31. 31. Future challenge 2005-2015 2016-
  32. 32. Information 2016/10/16 email: taniguchi@ci.ristumei.ac.jp Special Thanks • Ritsumeikan University • R. Nakashima, S. Nagasaka, A. Taniguchi, K. Hayashi • DENSO co. • T. Bando, K. Takenaka, K. Hitomi • Okayama Pref. Univ. • N. Iwahashi Visit http://www.tanichu.com/ Facebook: please search me Twitter: @tanichu Acknowledgement [Github] NPB-DAA https://github.com/EmergentSystemLabStudent/NPB_DAA @tanichu

×