Speech Compression


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size
  • Speech Compression

    1. 1. Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik
    2. 2. Introduction <ul><li>Digital Audio Compression </li></ul><ul><ul><li>Removal of redundant or otherwise irrelevant information from audio signal </li></ul></ul><ul><ul><li>Audio compression algorithms are often referred to as “audio encoders” </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Reduces required storage space </li></ul></ul><ul><ul><li>Reduces required transmission bandwidth </li></ul></ul>
    3. 3. Audio Compression <ul><li>Audio signal – overview </li></ul><ul><ul><li>Sampling rate (# of samples per second) </li></ul></ul><ul><ul><li>Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate </li></ul></ul><ul><ul><li>Number of channels (mono / stereo / multichannel) </li></ul></ul><ul><li>Reduction by lowering those values or by data compression / encoding </li></ul>
    4. 4. Audio Data Compression <ul><li>Redundant information </li></ul><ul><ul><li>Implicit in the remaining information </li></ul></ul><ul><ul><li>Ex. oversampled audio signal </li></ul></ul><ul><li>Irrelevant information </li></ul><ul><ul><li>Perceptually insignificant </li></ul></ul><ul><ul><li>Cannot be recovered from remaining information </li></ul></ul>
    5. 5. Audio Data Compression <ul><li>Lossless Audio Compression </li></ul><ul><ul><li>Removes redundant data </li></ul></ul><ul><ul><li>Resulting signal is same as original – perfect reconstruction </li></ul></ul><ul><li>Lossy Audio Encoding </li></ul><ul><ul><li>Removes irrelevant data </li></ul></ul><ul><ul><li>Resulting signal is similar to original </li></ul></ul>
    6. 6. Audio Data Compression <ul><li>Audio vs. Speech Compression Techniques </li></ul><ul><ul><li>Speech Compression uses a human vocal tract model to compress signals </li></ul></ul><ul><ul><li>Audio Compression does not use this technique due to larger variety of possible signal variations </li></ul></ul>
    7. 7. Generic Audio Encoder
    8. 8. Generic Audio Encoder <ul><li>Psychoacoustic Model </li></ul><ul><ul><li>Psychoacoustics – study of how sounds are perceived by humans </li></ul></ul><ul><ul><li>Uses perceptual coding </li></ul></ul><ul><ul><ul><li>eliminate information from audio signal that is inaudible to the ear </li></ul></ul></ul><ul><ul><li>Detects conditions under which different audio signal components mask each other </li></ul></ul>
    9. 9. Psychoacoustic Model <ul><li>Signal Masking </li></ul><ul><ul><li>Threshold cut-off </li></ul></ul><ul><ul><li>Spectral (Frequency / Simultaneous) Masking </li></ul></ul><ul><ul><li>Temporal Masking </li></ul></ul><ul><li>Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain </li></ul>
    10. 10. Signal Masking <ul><li>Threshold cut-off </li></ul><ul><ul><li>Hearing threshold level – a function of frequency </li></ul></ul><ul><ul><li>Any frequency components below the threshold will not be perceived by human ear </li></ul></ul>
    11. 11. Signal Masking <ul><li>Spectral Masking </li></ul><ul><ul><li>A frequency component can be partly or fully masked by another component that is close to it in frequency </li></ul></ul><ul><ul><li>This shifts the hearing threshold </li></ul></ul>
    12. 12. Signal Masking <ul><li>Temporal Masking </li></ul><ul><ul><li>A quieter sound can be masked by a louder sound if they are temporally close </li></ul></ul><ul><ul><li>Sounds that occur both (shortly) before and after volume increase can be masked </li></ul></ul>
    13. 13. Spectral Analysis <ul><li>Tasks of Spectral Analysis </li></ul><ul><ul><li>To derive masking thresholds to determine which signal components can be eliminated </li></ul></ul><ul><ul><li>To generate a representation of the signal to which masking thresholds can be applied </li></ul></ul><ul><li>Spectral Analysis is done through transforms or filter banks </li></ul>
    14. 14. Spectral Analysis <ul><li>Transforms </li></ul><ul><ul><li>Fast Fourier Transform (FFT) </li></ul></ul><ul><ul><li>Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only </li></ul></ul><ul><ul><li>Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT </li></ul></ul>
    15. 15. Spectral Analysis <ul><li>Filter Banks </li></ul><ul><ul><li>Time sample blocks are passed through a set of bandpass filters </li></ul></ul><ul><ul><li>Masking thresholds are applied to resulting frequency subband signals </li></ul></ul><ul><ul><li>Poly-phase and wavelet banks are most popular filter structures </li></ul></ul>
    16. 16. Filter Bank Structures <ul><li>Polyphase Filter Bank [used in all of the MPEG-1 encoders] </li></ul><ul><ul><li>Signal is separated into subbands, the widths of which are equal over the entire frequency range </li></ul></ul><ul><ul><li>The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process) </li></ul></ul>
    17. 17. Filter Bank Structures <ul><li>Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] </li></ul><ul><ul><li>Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies) </li></ul></ul><ul><ul><li>This allows for better time resolution (ex. short attacks), but at expense of frequency resolution </li></ul></ul>
    18. 18. Noise Allocation <ul><li>System Task: derive and apply shifted hearing threshold to the input signal </li></ul><ul><ul><li>Anything below the threshold doesn’t need to be transmitted </li></ul></ul><ul><ul><li>Any noise below the threshold is irrelevant </li></ul></ul><ul><li>Frequency component quantization </li></ul><ul><ul><li>Tradeoff between space and noise </li></ul></ul><ul><ul><li>Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation </li></ul></ul>
    19. 19. Noise Allocation <ul><li>Pre-echo </li></ul><ul><ul><li>In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding </li></ul></ul><ul><ul><li>This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case </li></ul></ul><ul><ul><li>This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking) </li></ul></ul>
    20. 20. Pre-echo Effect
    21. 21. Additional Encoding Techniques <ul><li>Other encoding techniques techniques are available (alternative or in combination) </li></ul><ul><ul><li>Predictive Coding </li></ul></ul><ul><ul><li>Coupling / Delta Encoding </li></ul></ul><ul><ul><li>Huffman Encoding </li></ul></ul>
    22. 22. Additional Encoding Techniques <ul><li>Predictive Coding </li></ul><ul><ul><li>Often used in speech and image compression </li></ul></ul><ul><ul><li>Estimates the expected value for each sample based on previous sample values </li></ul></ul><ul><ul><li>Transmits/stores the difference between the expected and received value </li></ul></ul><ul><ul><li>Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample </li></ul></ul><ul><ul><li>Used for additional compression in MPEG2 AAC </li></ul></ul>
    23. 23. Additional Encoding Techniques <ul><li>Coupling / Delta encoding </li></ul><ul><ul><li>Used in cases where audio signal consists of two or more channels (stereo or surround sound) </li></ul></ul><ul><ul><li>Similarities between channels are used for compression </li></ul></ul><ul><ul><li>A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode </li></ul></ul><ul><ul><li>This is a case of lossless encoding process </li></ul></ul>
    24. 24. Additional Encoding Techniques <ul><li>Huffman Coding </li></ul><ul><ul><li>Information-theory-based technique </li></ul></ul><ul><ul><li>An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table </li></ul></ul><ul><ul><li>Implemented using a look-up tables in encoder and in decoder </li></ul></ul><ul><ul><li>Provides substantial lossless compression, but requires high computational power and therefore is not very popular </li></ul></ul><ul><ul><li>Used by MPEG1 and MPEG2 AAC </li></ul></ul>
    25. 25. Encoding - Final Stages <ul><li>Audio data packed into frames </li></ul><ul><li>Frames stored or transmitted </li></ul>
    26. 26. Conclusion <ul><li>HTML Bibliography </li></ul><ul><ul><li>http://www.music.mcgill.ca/~pkoles </li></ul></ul><ul><li>Questions </li></ul>