Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lecture 8 audio compression


Published on

Published in: Education, Technology
  • Be the first to comment

Lecture 8 audio compression

  1. 1. Audio CompressionTechniques Lecture 8 Prepared by Razia Nisar Noorani 1
  2. 2. Introduction Digital Audio Compression  Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders” Applications  Reduces required storage space  Reduces required transmission bandwidth 2
  3. 3. Audio Compression Audio signal – overview  Sampling rate (# of samples per second)  Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding 3
  4. 4. Audio Data Compression Redundant information  Implicit in the remaining information  Ex. oversampled audio signal  oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled Irrelevant information  Perceptuallyinsignificant  Cannot be recovered from remaining information 4
  5. 5. Audio Data Compression Lossless Audio Compression  Removes redundant data  Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding  Removes irrelevant data  Resulting signal is similar to original 5
  6. 6. Audio Data Compression Audio vs. Speech Compression Techniques  Speech Compression uses a human vocal tract model to compress signals  Audio Compression does not use this technique due to larger variety of possible signal variations 6
  7. 7. Generic Audio Encoder Psychoacoustic Model  Psychoacoustics – study of how sounds are perceived by humans  Uses perceptual coding  eliminate information from audio signal that is inaudible to the ear  Detectsconditions under which different audio signal components mask each other 7
  8. 8. Psychoacoustic Model Signal Masking  Threshold cut-off  Spectral (Frequency / Simultaneous) Masking  Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain 8
  9. 9. Signal Masking Threshold cut-off  Hearing threshold level – a function of frequency  Any frequency components below the threshold will not be perceived by human ear 9
  10. 10. Signal Masking Spectral Masking A frequency component can be partly or fully masked by another component that is close to it in frequency  This shifts the hearing threshold 10
  11. 11. Signal Masking Temporal Masking A quieter sound can be masked by a louder sound if they are temporally close  Sounds that occur both (shortly) before and after volume increase can be masked 11
  12. 12. Spectral Analysis a device or algorithm that identifies a frequency domain representation of a time domain signal. Tasks of Spectral Analysis  To derive masking thresholds to determine which signal components can be eliminated  To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks 12
  13. 13. Spectral Analysis Transforms  Fast Fourier Transform (FFT)  Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only  Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT 13
  14. 14. Spectral Analysis Filter Banks a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original signal  Time sample blocks are passed through a set of bandpass filters  Masking thresholds are applied to resulting frequency subband signals  Poly-phase and wavelet banks are most popular filter structures 14
  15. 15. Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders]  Signal is separated into subbands, the widths of which are equal over the entire frequency range  The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process) 15
  16. 16. Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]  Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies)  This allows for better time resolution (ex. short attacks), but at expense of frequency resolution 16
  17. 17. Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal  Anything below the threshold doesn’t need to be transmitted  Any noise below the threshold is irrelevant Frequency component quantization  Tradeoff between space and noise  Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation 17
  18. 18. Noise Allocation Pre-echo  In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding  This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case  This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking) 18
  19. 19. Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination)  Predictive Coding  Coupling / Delta Encoding  Huffman Encoding 19
  20. 20. Additional Encoding Techniques Predictive Coding  Often used in speech and image compression  Estimates the expected value for each sample based on previous sample values  Transmits/stores the difference between the expected and received value  Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample  Used for additional compression in MPEG2 AAC (Advance audio Coding) 20
  21. 21. Additional Encoding Techniques Coupling / Delta encoding  Used in cases where audio signal consists of two or more channels (stereo or surround sound)  Similarities between channels are used for compression  A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process 21
  22. 22. Additional Encoding Techniques Huffman Coding  Information-theory-based technique  An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table  Implemented using a look-up tables in encoder and in decoder  Provides substantial lossless compression, but requires high computational power and therefore is not very popular  Used by MPEG1 and MPEG2 AAC 22
  23. 23. Encoding - Final Stages Audio data packed into frames Frames stored or transmitted 23
  24. 24. Questions 24