Emotional Tts

3,276 views
3,140 views

Published on

overview on emotional speech synthesis.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,276
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
106
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Emotional Tts

  1. 1. Emotional Speech Synthesis State of the art 2009 Felix Burkhardt 19.05.2009 1
  2. 2. outline how to model and why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion and outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 2
  3. 3. contents how to model and why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 3
  4. 4. emotion models anger joy …everyone except a psychologist knows what an emotion is (Young 1973) categories, e.g. anger, joy, … despair dimensions, e.g. activation, neutral dominance, valence arousal appraisals, e.g. novelty, intrinsic pleasantness, relevance, coping content potential, e anc boredom in d om sadness emotion cube valence source: Burkhardt 2001 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 4
  5. 5. why model emotional behaviour? aspects of emotion modeling in human-machine interaction: source: Batliner et al 2006 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 5
  6. 6. applications of emotional tts fun, e.g. emotional greetings prosthesis emotional chat avatars gaming, believable characters time adapted dialog design adapted persona design target-group specific advertising … believable agents … artificial humans Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 6
  7. 7. aspects of emotional tts Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 7
  8. 8. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 8
  9. 9. speech features descriptive layers of speech source: Reynolds et al 2003 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 9
  10. 10. emotion in speech neutral angry happy bored frightened sad spectrograms from emotional acted speech source: TUB emotional database Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 10
  11. 11. emotional data? actors vs. reality Berlin EmoDB: 10 actors x 7 emotions x 10 sentences alternatives induced data, e.g. Aibo television, radio data EmoDB: Burkhardt et al 2005 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 11
  12. 12. how to describe emotion? EmotionML, incubator group at W3C Example, embedded in SSML: <speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;> <voice gender=quot;femalequot;> <prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;> Hi, am sad know but start getting angry... </prosody> </voice> <emotion> <category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/> <timing start=quot;10%quot; end=quot;50%quot;/> </emotion> <emotion> <category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/> <timing start=quot;50%quot; end=quot;100%quot;/> </emotion> </speak> http://www.w3.org/2005/Incubator/emotion/ Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 12
  13. 13. loquendo tts director source: Loquendo Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 13
  14. 14. contents why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 14
  15. 15. speech synthesis taxonomy speech synthesis systems voice response systems re (copy)-synthesis, voice transformation arbitary speech synthesizers voice conversion text-to-speech concept-to-speech (unknown input) (input from text-generation system) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 15
  16. 16. tts process chain NLP natural DSP digital language speech phonetic transcription processing prosody track processing preprocessing unit concatenation / search morpho-syntactic analysis prosody fitting transpcription edge smoothing prosody modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 16
  17. 17. synthesis approaches signal modeling system modeling articulatory synthesis vocal tract shape synthesis atory rticul rule based do a data based pseu expert systems statistical model generated non-uniform unit selection concatenative synthesis formant synthesis HMM hidden markov models ANN neural nets coding of units type of units syllables, diphones, parametric coded waveform coded allophones, LPC linear predictive coding PCM subsegments MFCC mel frequency cepstral LDM (linear delta mod.) MBR multi band resynthesis formants hybrid approaches MBRPSOLA, RELP Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 17
  18. 18. historic development natural sounding domain dependent non-uniform unit selection e.g. RealSpeak PSOLA based synthesis e.g. Elan formant synthesis e.g. Dec Talk articulatory van Kempelen flexible 1780 …. 1980 1990 2000 not flexible historic modern artificial sounding domain independent Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 18
  19. 19. system modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 19
  20. 20. source filter model source: Klatt80 formant synthesizer (Klatt 1980) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 20
  21. 21. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 21
  22. 22. examples: emofilt open source Java program based on MBROLA synthesis engine. NOT a complete text-to-speech system prosody filter between natural language and digital speech signal processing modules as multilingual as MBROLA which currently supports 35 languages. Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 22
  23. 23. examples: emoSpeak emoSpeak is integrated into the MARY text-to- speech framework by DFKI. Marc Schröder investigated in his ph.d. thesis, how to assign rule-based modification of speech to emotional dimensions. the system can be freely dowloaded source: Schröder 2004 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 23
  24. 24. examples voice conversion Murtaza Bulut et al, PSOLA - LPC neutral angry USC conversion Greg Beller, IRCAM Phase vocoder neutral sad Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 24
  25. 25. examples voice transformation Olivier Rosec Mixed LF + harmonic woman FranceTelecom 2009 model as boy as man man breathy whispery tense Shiva Sundaram Laughter synthesis by USC 2007 LPC synthesis and mass-spring model Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 25
  26. 26. examples formant synthesis AffectEditor DEC Talk prosody sad angry J. Cahn, MIT 1998 rules EmoSyn prosody rules + neutral sad Burkhardt, 2000 phonation model angry crying content Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 26
  27. 27. examples diphone synthesis MARY prosody rules for joy angry M. Schröder, DFKI dimensions three inventories for soft, normal and tense speech EmoFilt prosody rules neutral joy Burkhardt, 1999 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 27
  28. 28. examples statistical based Tokyo Institute, HMM models spectral neutral joy Kobayashi Lab and prosodic features Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 28
  29. 29. examples unit selection fun personality voices Damian Shouty CTTS with expressive product research units extralinguistic units Katrin Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 29
  30. 30. examples non human Oudeyer: Sony pet concatenative happy sad robots MIT Kismet robot formant synthesis anger fear Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 30
  31. 31. examples singing vocal tract lab 2007 donna nobis Peter Birkholz articulatory pavarobotti 1993 aria Ingo Titze Articulatory Bell Labs Gerstman & 1961 articulatory, first bicycle Mathews, song ever Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 31
  32. 32. more examples … http://emosamples.syntheticspeech.de Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 32
  33. 33. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 33
  34. 34. conclusion emotions are part of natural speech simulation possible by either modeling the process including emotional data still text to speech fights with intelligible, neutral speech first steps: speaking styles, extralinguistics first apps: fun, gaming Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 34
  35. 35. outlook discrepancy between natural but unflexible vs. artificial sounding but flexible solutions short - middle term: very large databases hybrid parametric – non-uniform unit selection voice transformation techniques high quality source filter model based synthesis solutions on the long run physical modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 35
  36. 36. references Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 36

×