Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Audio CompressionTechniques  Lecture 8              Prepared by              Razia Nisar Noorani                          ...
Introduction   Digital Audio Compression     Removal   of redundant or otherwise irrelevant      information from audio ...
Audio Compression   Audio signal – overview     Sampling   rate (# of samples per second)     Bit rate (# of bits per s...
Audio Data Compression   Redundant information     Implicit             in the remaining information     Ex. oversample...
Audio Data Compression   Lossless Audio Compression     Removes   redundant data     Resulting signal is same as origin...
Audio Data Compression   Audio vs. Speech Compression    Techniques     Speech  Compression uses a human vocal      trac...
Generic Audio Encoder   Psychoacoustic Model     Psychoacoustics – study of how sounds are      perceived by humans    ...
Psychoacoustic Model   Signal Masking     Threshold  cut-off     Spectral (Frequency / Simultaneous) Masking     Tempo...
Signal Masking   Threshold cut-off     Hearing  threshold      level – a function of      frequency     Any frequency  ...
Signal Masking   Spectral Masking    A   frequency      component can be      partly or fully masked      by another com...
Signal Masking   Temporal Masking    A  quieter sound can      be masked by a louder      sound if they are      tempora...
Spectral Analysis   a device or algorithm that identifies a    frequency domain representation of a    time domain signal...
Spectral Analysis   Transforms     Fast Fourier Transform (FFT)     Discrete Cosine Transform (DCT) - similar to      F...
Spectral Analysis   Filter Banks   a filter bank is an array of band-pass filters that    separates the input signal int...
Filter Bank Structures   Polyphase Filter Bank    [used in all of the MPEG-1 encoders]     Signal is separated into subb...
Filter Bank Structures   Wavelet Filter Bank    [used by Enhanced Perceptual Audio    Coder (EPAC) by Lucent]     Unlike...
Noise Allocation   System Task: derive and apply shifted hearing    threshold to the input signal     Anything  below th...
Noise Allocation   Pre-echo     In case a single audio block contains silence followed      by a loud attack, pre-echo e...
Additional Encoding Techniques   Other encoding techniques techniques are    available (alternative or in combination)   ...
Additional Encoding Techniques   Predictive Coding     Often used in speech and image compression     Estimates the exp...
Additional Encoding Techniques   Coupling / Delta encoding     Used  in cases where audio signal consists of two or     ...
Additional Encoding Techniques   Huffman Coding     Information-theory-based   technique     An element of a signal tha...
Encoding - Final Stages Audio data packed into frames Frames stored or transmitted                                  23
Questions            24
Upcoming SlideShare
Loading in …5
×

Lecture 8 audio compression

7,208 views

Published on

Published in: Education, Technology
  • Be the first to comment

Lecture 8 audio compression

  1. 1. Audio CompressionTechniques Lecture 8 Prepared by Razia Nisar Noorani 1
  2. 2. Introduction Digital Audio Compression  Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders” Applications  Reduces required storage space  Reduces required transmission bandwidth 2
  3. 3. Audio Compression Audio signal – overview  Sampling rate (# of samples per second)  Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding 3
  4. 4. Audio Data Compression Redundant information  Implicit in the remaining information  Ex. oversampled audio signal  oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled Irrelevant information  Perceptuallyinsignificant  Cannot be recovered from remaining information 4
  5. 5. Audio Data Compression Lossless Audio Compression  Removes redundant data  Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding  Removes irrelevant data  Resulting signal is similar to original 5
  6. 6. Audio Data Compression Audio vs. Speech Compression Techniques  Speech Compression uses a human vocal tract model to compress signals  Audio Compression does not use this technique due to larger variety of possible signal variations 6
  7. 7. Generic Audio Encoder Psychoacoustic Model  Psychoacoustics – study of how sounds are perceived by humans  Uses perceptual coding  eliminate information from audio signal that is inaudible to the ear  Detectsconditions under which different audio signal components mask each other 7
  8. 8. Psychoacoustic Model Signal Masking  Threshold cut-off  Spectral (Frequency / Simultaneous) Masking  Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain 8
  9. 9. Signal Masking Threshold cut-off  Hearing threshold level – a function of frequency  Any frequency components below the threshold will not be perceived by human ear 9
  10. 10. Signal Masking Spectral Masking A frequency component can be partly or fully masked by another component that is close to it in frequency  This shifts the hearing threshold 10
  11. 11. Signal Masking Temporal Masking A quieter sound can be masked by a louder sound if they are temporally close  Sounds that occur both (shortly) before and after volume increase can be masked 11
  12. 12. Spectral Analysis a device or algorithm that identifies a frequency domain representation of a time domain signal. Tasks of Spectral Analysis  To derive masking thresholds to determine which signal components can be eliminated  To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks 12
  13. 13. Spectral Analysis Transforms  Fast Fourier Transform (FFT)  Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only  Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT 13
  14. 14. Spectral Analysis Filter Banks a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original signal  Time sample blocks are passed through a set of bandpass filters  Masking thresholds are applied to resulting frequency subband signals  Poly-phase and wavelet banks are most popular filter structures 14
  15. 15. Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders]  Signal is separated into subbands, the widths of which are equal over the entire frequency range  The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process) 15
  16. 16. Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]  Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies)  This allows for better time resolution (ex. short attacks), but at expense of frequency resolution 16
  17. 17. Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal  Anything below the threshold doesn’t need to be transmitted  Any noise below the threshold is irrelevant Frequency component quantization  Tradeoff between space and noise  Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation 17
  18. 18. Noise Allocation Pre-echo  In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding  This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case  This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking) 18
  19. 19. Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination)  Predictive Coding  Coupling / Delta Encoding  Huffman Encoding 19
  20. 20. Additional Encoding Techniques Predictive Coding  Often used in speech and image compression  Estimates the expected value for each sample based on previous sample values  Transmits/stores the difference between the expected and received value  Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample  Used for additional compression in MPEG2 AAC (Advance audio Coding) 20
  21. 21. Additional Encoding Techniques Coupling / Delta encoding  Used in cases where audio signal consists of two or more channels (stereo or surround sound)  Similarities between channels are used for compression  A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process 21
  22. 22. Additional Encoding Techniques Huffman Coding  Information-theory-based technique  An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table  Implemented using a look-up tables in encoder and in decoder  Provides substantial lossless compression, but requires high computational power and therefore is not very popular  Used by MPEG1 and MPEG2 AAC 22
  23. 23. Encoding - Final Stages Audio data packed into frames Frames stored or transmitted 23
  24. 24. Questions 24

×