speech recognition and synthesis overview

  1. 1. Speech Technology OverviewPresented byAmr Medhat Computer Engineering Department Cairo University 22-10-2005
  2. 2. ??Speech… WhyThe easiest way of communication for human beings
  3. 3. ??Speech… How Noise Channel Signal + … ProtocolSender Message Receiver
  4. 4. Computer Analogy text (TTS) speech Speech SpeechProduction Synthesis (ASR) ( ) Speech speech Speech textPerception Recognition
  5. 5. Recognition Made Easy I bought a boat. ‫افرنقعوا أيها المتكأكئين‬ gute NachtFeature DecoderExtraction (Search) Grammar Lexicon Phone Models
  6. 6. Recognizer Characteristics Discrete words / continuous speech Read / spontaneous speech Speaker dependent / independent Small / large vocabulary Finite state / context sensitive language model
  7. 7. What to study Phonetics and Phonology (Linguistics) Speech Signal Processing (DSP) Pattern Recognition (AI)  Hidden Markov Models ( )  Artificial Neural Networks  Hybrid ANN - HMM
  8. 8. Phonetics Phonetics: study of the production, perception, and physical properties of speech sounds Phonology: describes the way sounds function within a given language and how they are combined and organized Phoneme: The smallest phonetic unit in a language that is capable of conveying a distinction in meaning E.g.  boat-bought, car-jar, ‫نشاط-شمس ,أرض-أحمد‬
  9. 9. Speech Signal Processing Sampling  Rate: e.g. 16 kHz  Sample size: e.g. 16 bits Format: PCM (.wav files) Time or Frequency domain features? Spectrogram: represents the time-varying spectrum of a signal. (x, y, intensity) Can’t represent features?:  Filters Banks, LPCs, MFCCs
  10. 10. SpectrogramWaveform and Spectrogram of the word: "phonetician"
  11. 11. HMM What is a model? The coins example Parameter estimation: Baum-Welch Decoding: Viterbi P (O | λ)
  12. 12. Tools Audio Editing  Cool Edit ( )  Gold Wave  Sound Forge ASR  HTK ( )  MATLAB  Microsoft SAPI SDK  Java Speech API  ISIP ASR Toolkit  Torch (Machine learning tool)
  13. 13. Technologies and applications Speech Recognition  Dictation  Call centers & IVR systems  Command and control Speech Verification: Pronunciation teaching Speaker Recognition: Security Speech Synthesis  Reading for the blind  Telephone inquiries
  14. 14. ?Can Image Processing Help Audio Visual Speech Recognition Spectrogram Reading Spectrogram Filtering vOICE: seeing with sound