WHAT IS SPEECH
PROCESSING?
Florian Leibert
INTRODUCTION
• Florian “Flo” Leibert earned a bachelor's degree in computer
science and business from International University in Bruchsal,
Germany in 2006. While attending university, Florian Leibert
worked on many machine learning projects, including speech
processing.
Communicating with computers through speech has been an
area of intense research for decades. Basic speech recognition
software can identify a limited amount of words and phrases only
when such are properly enunciated. However, as speech
recognition software becomes more advanced, it is able to
identify and accept more natural speech.
Several steps are taken when a machine converts speech to text.
Initially, the analog-digital converter (ADC) converts the analog
wave produced from vibrations of the human voice into digital
data readable by a computer.
SPEECH PROCESSING
• Acoustic and language modeling algorithms match
sounds with words and phrases to accurately convert
these sounds and distinguish between similar-
sounding words.
The accuracy and speed of voice recognition software
determines its performance. The word error rate (WER)
measures accuracy in the transcription but cannot
recognize if the error occurred due to pronunciation,
volume, background noise, or other factors.

What Is Speech Processing?

  • 1.
  • 2.
    INTRODUCTION • Florian “Flo”Leibert earned a bachelor's degree in computer science and business from International University in Bruchsal, Germany in 2006. While attending university, Florian Leibert worked on many machine learning projects, including speech processing. Communicating with computers through speech has been an area of intense research for decades. Basic speech recognition software can identify a limited amount of words and phrases only when such are properly enunciated. However, as speech recognition software becomes more advanced, it is able to identify and accept more natural speech. Several steps are taken when a machine converts speech to text. Initially, the analog-digital converter (ADC) converts the analog wave produced from vibrations of the human voice into digital data readable by a computer.
  • 3.
    SPEECH PROCESSING • Acousticand language modeling algorithms match sounds with words and phrases to accurately convert these sounds and distinguish between similar- sounding words. The accuracy and speed of voice recognition software determines its performance. The word error rate (WER) measures accuracy in the transcription but cannot recognize if the error occurred due to pronunciation, volume, background noise, or other factors.