• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Speech Compression

Speech Compression






Total Views
Views on SlideShare
Embed Views



3 Embeds 61

http://www.ustudy.in 59
http://www.slideshare.net 1
http://translate.googleusercontent.com 1


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size

Speech Compression Speech Compression Presentation Transcript

  • Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik
  • Introduction
    • Digital Audio Compression
      • Removal of redundant or otherwise irrelevant information from audio signal
      • Audio compression algorithms are often referred to as “audio encoders”
    • Applications
      • Reduces required storage space
      • Reduces required transmission bandwidth
  • Audio Compression
    • Audio signal – overview
      • Sampling rate (# of samples per second)
      • Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate
      • Number of channels (mono / stereo / multichannel)
    • Reduction by lowering those values or by data compression / encoding
  • Audio Data Compression
    • Redundant information
      • Implicit in the remaining information
      • Ex. oversampled audio signal
    • Irrelevant information
      • Perceptually insignificant
      • Cannot be recovered from remaining information
  • Audio Data Compression
    • Lossless Audio Compression
      • Removes redundant data
      • Resulting signal is same as original – perfect reconstruction
    • Lossy Audio Encoding
      • Removes irrelevant data
      • Resulting signal is similar to original
  • Audio Data Compression
    • Audio vs. Speech Compression Techniques
      • Speech Compression uses a human vocal tract model to compress signals
      • Audio Compression does not use this technique due to larger variety of possible signal variations
  • Generic Audio Encoder
  • Generic Audio Encoder
    • Psychoacoustic Model
      • Psychoacoustics – study of how sounds are perceived by humans
      • Uses perceptual coding
        • eliminate information from audio signal that is inaudible to the ear
      • Detects conditions under which different audio signal components mask each other
  • Psychoacoustic Model
    • Signal Masking
      • Threshold cut-off
      • Spectral (Frequency / Simultaneous) Masking
      • Temporal Masking
    • Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
  • Signal Masking
    • Threshold cut-off
      • Hearing threshold level – a function of frequency
      • Any frequency components below the threshold will not be perceived by human ear
  • Signal Masking
    • Spectral Masking
      • A frequency component can be partly or fully masked by another component that is close to it in frequency
      • This shifts the hearing threshold
  • Signal Masking
    • Temporal Masking
      • A quieter sound can be masked by a louder sound if they are temporally close
      • Sounds that occur both (shortly) before and after volume increase can be masked
  • Spectral Analysis
    • Tasks of Spectral Analysis
      • To derive masking thresholds to determine which signal components can be eliminated
      • To generate a representation of the signal to which masking thresholds can be applied
    • Spectral Analysis is done through transforms or filter banks
  • Spectral Analysis
    • Transforms
      • Fast Fourier Transform (FFT)
      • Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only
      • Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
  • Spectral Analysis
    • Filter Banks
      • Time sample blocks are passed through a set of bandpass filters
      • Masking thresholds are applied to resulting frequency subband signals
      • Poly-phase and wavelet banks are most popular filter structures
  • Filter Bank Structures
    • Polyphase Filter Bank [used in all of the MPEG-1 encoders]
      • Signal is separated into subbands, the widths of which are equal over the entire frequency range
      • The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
  • Filter Bank Structures
    • Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]
      • Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies)
      • This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
  • Noise Allocation
    • System Task: derive and apply shifted hearing threshold to the input signal
      • Anything below the threshold doesn’t need to be transmitted
      • Any noise below the threshold is irrelevant
    • Frequency component quantization
      • Tradeoff between space and noise
      • Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation
  • Noise Allocation
    • Pre-echo
      • In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding
      • This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case
      • This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
  • Pre-echo Effect
  • Additional Encoding Techniques
    • Other encoding techniques techniques are available (alternative or in combination)
      • Predictive Coding
      • Coupling / Delta Encoding
      • Huffman Encoding
  • Additional Encoding Techniques
    • Predictive Coding
      • Often used in speech and image compression
      • Estimates the expected value for each sample based on previous sample values
      • Transmits/stores the difference between the expected and received value
      • Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample
      • Used for additional compression in MPEG2 AAC
  • Additional Encoding Techniques
    • Coupling / Delta encoding
      • Used in cases where audio signal consists of two or more channels (stereo or surround sound)
      • Similarities between channels are used for compression
      • A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode
      • This is a case of lossless encoding process
  • Additional Encoding Techniques
    • Huffman Coding
      • Information-theory-based technique
      • An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table
      • Implemented using a look-up tables in encoder and in decoder
      • Provides substantial lossless compression, but requires high computational power and therefore is not very popular
      • Used by MPEG1 and MPEG2 AAC
  • Encoding - Final Stages
    • Audio data packed into frames
    • Frames stored or transmitted
  • Conclusion
    • HTML Bibliography
      • http://www.music.mcgill.ca/~pkoles
    • Questions