Lecture 8 audio compression
Upcoming SlideShare
Loading in...5

Lecture 8 audio compression






Total Views
Views on SlideShare
Embed Views



1 Embed 15

http://www.cs-2009-7b.co.cc 15



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size

Lecture 8 audio compression Lecture 8 audio compression Presentation Transcript

  • Audio CompressionTechniques Lecture 8 Prepared by Razia Nisar Noorani 1
  • Introduction Digital Audio Compression  Removal of redundant or otherwise irrelevant information from audio signal  Audio compression algorithms are often referred to as “audio encoders” Applications  Reduces required storage space  Reduces required transmission bandwidth 2
  • Audio Compression Audio signal – overview  Sampling rate (# of samples per second)  Bit rate (# of bits per second). Typically, uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate  Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data compression / encoding 3
  • Audio Data Compression Redundant information  Implicit in the remaining information  Ex. oversampled audio signal  oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled Irrelevant information  Perceptuallyinsignificant  Cannot be recovered from remaining information 4
  • Audio Data Compression Lossless Audio Compression  Removes redundant data  Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding  Removes irrelevant data  Resulting signal is similar to original 5
  • Audio Data Compression Audio vs. Speech Compression Techniques  Speech Compression uses a human vocal tract model to compress signals  Audio Compression does not use this technique due to larger variety of possible signal variations 6
  • Generic Audio Encoder Psychoacoustic Model  Psychoacoustics – study of how sounds are perceived by humans  Uses perceptual coding  eliminate information from audio signal that is inaudible to the ear  Detectsconditions under which different audio signal components mask each other 7
  • Psychoacoustic Model Signal Masking  Threshold cut-off  Spectral (Frequency / Simultaneous) Masking  Temporal Masking Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain 8
  • Signal Masking Threshold cut-off  Hearing threshold level – a function of frequency  Any frequency components below the threshold will not be perceived by human ear 9
  • Signal Masking Spectral Masking A frequency component can be partly or fully masked by another component that is close to it in frequency  This shifts the hearing threshold 10
  • Signal Masking Temporal Masking A quieter sound can be masked by a louder sound if they are temporally close  Sounds that occur both (shortly) before and after volume increase can be masked 11
  • Spectral Analysis a device or algorithm that identifies a frequency domain representation of a time domain signal. Tasks of Spectral Analysis  To derive masking thresholds to determine which signal components can be eliminated  To generate a representation of the signal to which masking thresholds can be applied Spectral Analysis is done through transforms or filter banks 12
  • Spectral Analysis Transforms  Fast Fourier Transform (FFT)  Discrete Cosine Transform (DCT) - similar to FFT but uses cosine values only  Modified Discrete Cosine Transform (MDCT) [used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT 13
  • Spectral Analysis Filter Banks a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original signal  Time sample blocks are passed through a set of bandpass filters  Masking thresholds are applied to resulting frequency subband signals  Poly-phase and wavelet banks are most popular filter structures 14
  • Filter Bank Structures Polyphase Filter Bank [used in all of the MPEG-1 encoders]  Signal is separated into subbands, the widths of which are equal over the entire frequency range  The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process) 15
  • Filter Bank Structures Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent]  Unlike polyphase filter, the widths of the subbands are not evenly spaced (narrower for higher frequencies)  This allows for better time resolution (ex. short attacks), but at expense of frequency resolution 16
  • Noise Allocation System Task: derive and apply shifted hearing threshold to the input signal  Anything below the threshold doesn’t need to be transmitted  Any noise below the threshold is irrelevant Frequency component quantization  Tradeoff between space and noise  Encoder saves on space by using just enough bits for each frequency component to keep noise under the threshold - this is known as noise allocation 17
  • Noise Allocation Pre-echo  In case a single audio block contains silence followed by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding  This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case  This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking) 18
  • Additional Encoding Techniques Other encoding techniques techniques are available (alternative or in combination)  Predictive Coding  Coupling / Delta Encoding  Huffman Encoding 19
  • Additional Encoding Techniques Predictive Coding  Often used in speech and image compression  Estimates the expected value for each sample based on previous sample values  Transmits/stores the difference between the expected and received value  Generates an estimate for the next sample and then adjusts it by the difference stored for the current sample  Used for additional compression in MPEG2 AAC (Advance audio Coding) 20
  • Additional Encoding Techniques Coupling / Delta encoding  Used in cases where audio signal consists of two or more channels (stereo or surround sound)  Similarities between channels are used for compression  A sum and difference between two channels are derived; difference is usually some value close to zero and therefore requires less space to encode  This is a case of lossless encoding process 21
  • Additional Encoding Techniques Huffman Coding  Information-theory-based technique  An element of a signal that often reoccurs in the signal is represented by a simpler symbol, and its value is stored in a look-up table  Implemented using a look-up tables in encoder and in decoder  Provides substantial lossless compression, but requires high computational power and therefore is not very popular  Used by MPEG1 and MPEG2 AAC 22
  • Encoding - Final Stages Audio data packed into frames Frames stored or transmitted 23
  • Questions 24