SlideShare a Scribd company logo
1 of 39
Download to read offline
Comparative Study of Analysis Filter
Bank Designs for AC-3 Encoding
Undergraduate Student Project
Mark Joseph L. Tan
2002-22034
B.S. Computer Engineering
Adviser
Franz A. de Leon, MS EE
University of the Philippines, Diliman
Approval Sheet
In partial fulfillment of the requirements for the degree of Bachelor of Science in
Computer Engineering, this project entitled “Comparative Study of Analysis Filter Bank
Designs for AC-3 Encoding” prepared and submitted by Mark Joseph L. Tan, is hereby
recommended for approval.
__________________
Franz de Leon
Adviser
Accepted in partial fulfillment of the requirements for the degree of Bachelor of
Science in Computer Engineering
Approved by:
__________________
Panel
__________________
Panel
__________________
Reader
__________________
EEE190 Handler
Abstract
Several high-fidelity digital audio formats have been developed with the challenge of
developing audio coding techniques intended for low cost, limited bandwidth and storage
capacity.
AC-3 is one of these international standards. It has been widely used as the audio
coding standard for laserdiscs, cinemas and DVD films. AC-3 has been accepted by the
Advanced Television Systems Committee (ATSC) as the audio coding standard for US
Grand Alliance High Definition Television (HDTV) system.
This project is a comparative study of analysis filter bank designs for AC-3 encoding.
The standard algorithm will first be implemented in stereo mode using software.
Different optimization techniques on the analysis filter bank designs will then be
performed. Finally, listening and speed performance tests will be done to compare the
optimized codes with the standard implementation.
The project will aid in future designs concerning the AC-3 audio coding standard,
with the goal of improving performance and efficiency for high-quality digital audio
compression.
Table of Contents
1. Introduction
1.1. Background of the Project
1.2. Project Objectives
1.3. Overview
2. Literature Review
2.1. Standard Digital Audio Format
2.1.1. Standard Formatss
2.1.2. MPEG-Based Proprietary Formats
2.1.3. Non-MPEG Proprietary Formats
2.2. AC-3 Audio Coding Standard
2.2.1. History and Development
2.2.2. AC-3 Compression Features
2.2.3. AC-3 Comparative Advantages
2.2.4. AC-3 Bit Stream
2.3. AC-3 Encoding Algorithm
2.3.1. Analysis Filter Bank
2.3.2. Spectral Envelope Encoding
2.3.3. Encode Bit Allocation
2.3.4. Mantissa Quantization
2.3.5. Frame Formatting and Bit Stream Packing
2.4. AC-3 Decoding Algorithm
2.4.1. Frame De-Formatting and Bit Stream Unpacking
2.4.2. Spectral Envelope Decoding
2.4.3. Decode Bit Allocation
2.4.4. Mantissa De-Quantization
2.4.5. Synthesis Filter Bank using IMDCT
2.5 Related Works
2.5.1 The ATSC Standard
2.5.2 FFmpeg Multimedia System
3 Methodology
3.1 Design
3.1.1 Direct Method
3.1.2 Indirect Method using DCT-IV
3.1.3 Indirect Method using FFT
3.1.4 Recursive Method
3.2 Implementation
3.3 Testing
3.3.1 Preliminary Test
3.3.2 Objective Tests
3.3.3 Formal Listening Test
3.3.4 Playback
3.3.5 Listening Conditions
4 Results and Analysis
4.1 Objective Measurements
Chapter 1
Introduction
Digital Audio Compression (AC-3) is the 3rd
generation audio coding algorithm
developed by the Digital coder group at Dolby Laboratories which eventually became the
audio coding standard for the US Grand Alliance HDTV system. The primary motivation
in developing the AC-3 audio coding standard was the need for a superior multi-channel
audio coder. AC-3 became one of the most successful proprietary digital audio formats
available in the market. It is usually integrated with video applications and is rarely used
as stand alone. Thus, even though it has been around since 1990’s, it is not as prominent
as the other consumer audio formats such as MPEG Layer III (mp3).
AC-3 has a wide variety of applications. It can found in laserdiscs, digital cable
and even satellite transmissions. But it is famous for being the audio format used in
cinemas and DVD’s. Like other popular codecs, it has a perceptual coding algorithm that
uses minimal digital information and lower data rates which allows minimum
degradation of quality. Thus, the sound almost exactly the same as the original sound
when decoded for playback providing true surround.
Currently, there are several audio decoders that are capable of AC-3 playback.
However, it is very rare to find audio encoders and decoders that are solely dedicated to
handling the AC-3 format.
1.1 Background of the Project
Figure 1-1 Project Overview
The project is a comparative study of the analysis filter bank designs for the AC-3
encoding which will be subdivided into two major parts. The first part would be the
software development of the AC-3 audio coding standard’s encoding algorithm while the
second part would be the implementation of analysis filter bank designs that will modify
the traditional algorithm. The project overview is shown in Figure 1-1.
Traditional
Encoding Algorithm
Modified Encoding
Algorithms
Final
Code
Comparative
Analysis
Objective
Analysis
For the development of AC-3 encoding algorithm, each of the different coding
blocks of this audio format will be identified, implemented and tested. Modification will
then be carried out in order decrease the amount of computations, thus, minimizing the
complexity. This will be verified by testing procedures which will include listening tests
to assess the quality of the audio file and tests to verify computational complexity.
The implementation of the encoding algorithm as well as the optimization
techniques will be based on various researches that deal with the AC-3 audio coding
standard.
1.2 Project Objectives
One of the major factors considered in creating perceptual audio coders is the
need for real-time processing. The speed of the encoding algorithm is directly related to
the computational complexity and the number of cycles it takes to process an audio
block. Since a large part of the calculation occurs in the filter bank, the interest of the
project lies on having a comparative analysis of different filter bank designs for the AC-3
encoding algorithm.
This will be done by implementing each block of the coding process to create an
encoder that operates in stereo mode and applying different methods for optimizing the
performance. The project aims to optimize a traditional AC-3 encoding algorithm to
reduce its computational complexity.
Lastly, it is of great importance to characterize AC-3 compression features since
this will be the first AC-3 related project to be conducted by DSP Laboratory. The project
aims to look into particular AC-3 features that make it one of the most powerful
perceptual audio standards.
1.3 Overview
The components of the AC-3 encoding algorithm can be grouped into five major
blocks. These are the: (1) analysis filter banks which perform the time-frequency
transformation of the input signal, (2) spectral envelope encoding where the power
spectral density is analyzed, (3)-(4) bit allocation and mantissa quantization that use
psychoacoustic models to manipulate the mantissa component of the input signal and (5)
formatting & bit stream packing that prepares the audio data for encoding. [1]
Several of the AC-3 related studies that have been carried out focused on
implementing the encoding algorithm and improving its performance such as those in [8]
[11] [12]. The emphasis of the optimization methods have been put mainly on the
analysis filter bank which aims to minimize complexity by decreasing the number of
computations needed by the encoding algorithm. This is because it is one of the most
calculation intensive processes in the algorithm. Thus, this project will compare the
optimization methods on the analysis filter bank design.
The project will be an implementation of the AC-3 audio coding standard’s
encoding algorithm using software. The entire standard algorithm along with the analysis
filter bank designs will be implemented using the C/C++ programming language.
Testing will be done using two methods, first would be objective tests that will
measure accuracy and speed to check the number of computations made. Second would
be subjective audio listening tests to examine quality and check if there are noticeable
differences on the generated AC-3 audio format.
Chapter 2
Literature Review
2.1 Standard Digital Audio Formats
The introduction of the audio compact disc (CD) technology in 1982 set off the
transformation of audio media from analog to digital format. The single purpose of CD
back then was to hold music in digital form. However, huge amount of data is required to
represent the digital audio file which made it necessary to develop audio coding
techniques for efficient transmission and storage. [1]
The goal of these coding techniques is to efficiently use the least amount of bits to
represent a digital audio file without making any obvious changes in the original data
through transparent coding. Audio coding technology has experienced tremendous
development that it has become possible to successfully remove as much as 90% of audio
data during compression without noticeable changes in the original from.
There are several international and commercially available audio coding standards
which can loosely be classified as Standard Formats, MPEG-Based Proprietary Formats,
and Non-MPEG Proprietary Formats. Table 2-1 shows some examples of each. [2]
Standard Formats MPEG-Based
Proprietary Formats
Non-MPEG
Proprietary Formats
Pulse Code Modulation (PCM) AT&T’s a2b AC-3
Differential PCM MP4 EPAC
Adaptive Differential PCM Liquid Audio Windows Media Audio (WMA)
u-law Compression Apple QuickTime Real Audio
ISO/MPEG Family Transparent Audio Compression (TAC)
TwinVQ (VQF)
Table 2-1 – Audio Coding Standards
2.1.1. Standard Formats
PCM is a type of format that can be read by several audio applications. DPCM,
APCM and u-law compressions are simple variations of lossy compression formats that
store only the difference between consecutive samples instead of the entire information.
[2]
The Moving Pictures Experts Groups (MPEG) under the International
Organization for Standardization (ISO) has developed several standards under the
ISO/MPEG Family. These include the MPEG-1 Layers I, II, and III - which can operate
in mono or stereo modes; MPEG-2 Layers I, II, III, and AAC - which can operate in
multi-channel; and MPEG-4 – which can operate in multi-channels low-bit rates.
2.1.2. MPEG-Based Proprietary Formats
The ISO/MPEG Family consists of notable open standards which have become
widely used. Because of this, several proprietary formats have been developed which are
MPEG-based. These enhancements in the ISO/MPEG standards paved way for the
development of sophisticated audio formats that support several features. [2]
2.1.3. Non-MPEG Proprietary Formats
There are also several Non-MPEG Proprietary Formats available in the market.
These formats offer a wide array of applications and claim to produce quality audio
sound that is indistinguishable from the original CD. Among all of these, AC-3 has
emerged to be the de facto standard found on the latest generation of laserdiscs, cinemas,
and DVD films because of its audio coding technique that makes it capable of delivering
high-fidelity audio content. [2]
2.2 AC-3 Audio Coding Standard
2.2.1. History and Development
In 1987, the Federal Communications Commission (FCC) formed the Advisory
Committee on Advanced Television Service (ACATS) in the United States.[6] ACATS
formally started the High Definition Television (HDTV) standardization process with the
aim of creating a an excellent multi-channel audio format for HDTV sound.
Dolby AC-1, a low-cost delta-modulation based coding technology was
developed. By 1989, advances in the audio coding technology allowed the development
of AC-2, a higher quality audio coding standard with reduced bit-rate. [6]
A real-time implementation of AC-3 was first conducted in multiple DSP cards.
By 1991, Star Trek VI became the first AC-3 encoded film. This multi-channel audio
compression standard was accepted by ATSC in 1994.
2.2.2. AC-3 Compression Features
One of the key features of the AC-3 audio standard is the use some very
sophisticated psychoacoustic models which allow channels to be combined and
separated. By using a global bit allocation technique, it allows for any channel to borrow
bits which are not being used.
It has been designed to take maximum advantage of human auditory masking in
that it divides the audio spectrum of each channel into narrow frequency bands of
different sizes optimized with respect to the frequency selectivity of human hearing. This
makes it possible to sharply filter coding noise so that it is forced to stay very close in
frequency to the frequency components of the audio signal being coded. By reducing or
eliminating coding noise wherever there are no audio signals to mask it, the sound quality
of the original signal can be subjectively preserved. [3]
AC-3 provides only full range channels so the sounds are much better in terms of
quality and allows monophonic up to 5.1 audio channels to provide the necessary
channels needed for a digital film audio system. The configuration for a 5.1 channel setup
is composed of five full bandwidth channels: Left, Right, Left Surround, Right Surround,
Center, and the Subwoofer is shown in Figure 2-1. Details of the AC-3 audio coding
standard are also shown in the table below. [4]
Figure 2-1 – AC-3 5.1 channel configuration
General Characteristics of AC-3 Audio Standard
Sample frequency 32 kHz, 44.1kHz, 48 kHz
Bit Depth 16 – 24 bits
Bit Rate (B) 32 – 640 kbps
Frame Size 1536 samples
Compression 10:1
DVD Characteristics of AC-3 Audio Standard
Maximum Sampling Frequency 192 kHz
Maximum Samples per second 192,000
Maximum Bit Depth 24-bits
Table 2-2 – AC-3 Characteristics
2.2.3. AC-3 Comparative Advantages
Evaluating audio compression is complicated because there is no objective quality
parameter. Instead, comparison of AC-3 with two other popular digital audio formats, CD
Quality (PCM) and MPEG Layer III (MP3) is shown in the Table 2-3.
RIGHT
SURROUND
LEFT
SURROUND
RIGHTLEFT
CENTER
(SUBWOOFER)
CD Quality MPEG Layer III* AC-3*
Precision Sampling Accuracy
(Bits/sample/channel)
16-bit 16-bit 24-bit
Gradiation Levels 65,536 65,536 16,777,216
Rate Sampling Rate ( Freq ) 44.1kHz 44.1 kHz 96 kHz
Samples per second 44,100 44,100 96,000
Bit Rate in bits/sec 1,411,200 128,000 192,000
Bit Rate in Kbytes/sec 176 16 24
Size MB/min 10 1 1.5
Compression 1:1 1:12 1:10
* Most commonly used configuration
Table 2-3 - Compares Attributes of AC-3 versus CD audio and MPEG Layer III
Precision values clearly show that AC-3 has a more accurate signal representation
in Figure 2-2. The 24-bit depth allows it to have 256 times more gradiation levels than
the CD quality. Despite of this, AC-3 can still compress audio data by a factor of 10 due
to its very high compression technique. Although this is not evident in stereo mode (since
AC-3 is designed to be most advantageous in multi-channel configurations), its
compression factor is still comparable to MPEG Layer III.
Figure 2-2 – Graphical Comparison of CD Audio with AC-3
2.2.4. AC-3 Bit Stream
An AC-3 Bit Stream is composed of a series of frames representing a constant
time interval of 1536 PCM samples across all coded channels. Each frame is an
independent entity and no data are shared with previous frames aside from the inherent
transform overlap. The frame has a fixed size and depends only on the sample rate and
coded data rate. It consists of 6 audio blocks with 256 samples each. The structures of the
AC-3 Frame and Audio Block are shown in Tables 2-4 and Figure 2-5.
SYNC CRC1 SI BSI AB0 AB1 AB2 AB3 AB4 AB5 AUX CRC2
SYNC It maintains synchronization with the encoder / decoder.
CRC1 and CRC2 These are means of error detection.
SI and BSI The Sync Information (SI) and Bit Stream Information (BSI) include
sample rate, data rate, and number of coded channels.
AB0-AB5 These are the 6 audio blocks, each representing 256 PCM samples
per coded channel.
AUX optional aux data for embedding private control or status
information
Table 2-4 – AC-3 Frame Structure
Block
Switch
Flags
Dither
Flags
Dynamic
Range
Control
Coupling
Strategy
Coupling
Coordinates
Exponent
Strategy
Exponents Bit
Allocation
Parameters
Mantissa
Table 2-5 – AC-3 Audio Block Structure
2.3 AC-3 Encoding Algorithm
The AC-3 encoding algorithm is composed of five major blocks as shown in
Figure 2-7. These are the Analysis Filter Bank, Spectral Envelope Encoding, Encode Bit
Allocation, Mantissa Quantization, and Frame Formatting and Bitstream Packing. The
block diagram below shows the AC-3 encoding process.
Figure 2-7 – AC-3 Encoder Block Diagram
2.3.1. Analysis Filter Bank
This block is subdivided into two smaller blocks which are the frequency domain
transformation and the block floating points as shown in Figure 2-8.
Analysis Filter Bank
Mantissa Quantization Spectral Envelope Encoding
Exponents
Encode Bit Allocation
PCM time samples
Frame Formatting and Bitstream Packing
Quantized Mantissa
Mantissa
Encoded Spectral Envelope
Encoded AC-3 bit stream
Figure 2-8 – Analysis Filter Bank Block Diagram
Frequency Domain Transformation using MDCT
The PCM time samples are best examined in the frequency domain due to the
frequency masking properties of human hearing. To transform the signal from a sequence
of PCM time samples to a sequence of frequency coefficient blocks, a 512-point
Modified Discrete Cosine Transform (MDCT) with 50% overlap is first performed.
The result is then multiplied by a time window to transform the input signal to
frequency domain. This MDCT is based on the idea of Time Domain Alias Cancellation
(TDAC) introduced by Princen and Bradley. [11] Normally, the analysis filter bank uses
the 512-point MDCT (long transform). But in the event of transient signals, this method
can be improved by using two 256-point transforms (short transforms) instead.
The 512 x 256 matrix that transforms the 512 windowing samples results in 256
frequency coefficients. The equation of this matrix is shown below:
cos( (2 1)(2 1_ (2 1))
1024 4
512 256
nkL n k k
where L x matrix
nk each entry of L
π π+
+
−
< > = + + + +
=
=
The 256 x 128 matrices that transforms the 512 windowing samples results in two
sets of 128 frequency coefficients. The equations of these matrices are shown below:
1
1
1
cos( (2 1)(2 1))
512
256 128
nkS n k
where S x matrix
nk each entry of S
π+
+
+
< > = + +
=
=
2
2
2
cos( (2 1)(2 1) (2 1))
512 2
256 128
nkS n k k
where S x matrix
nk each entry of S
π π+
+
+
< > = + + + +
=
=
PCM time samples
Analysis Filter Bank
Frequency Domain
Transformation
Block Floating-PointMantissa Exponent
Block Floating-Point
The input to this block is the set of all transform coefficients that have been
computed in the frequency domain transformation block. The floating point block simply
breaks down each of the frequency transform coefficients into binary exponent and
mantissa parts which will be analyzed separately in the succeeding blocks. [1]
2.3.2. Spectral Envelope Encoding
The Spectral Envelope Encoding involves two main processes, the power spectral
mapping and frequency banding. The exponents are first encoded according to time and
frequency resolution where they are mapped to the power spectrum so that the
computation of this spectrum can be simplified.
The frequency banding then maps the MDCT frequency domain to Fourier
Frequency domain where the psychoacoustic model is used to determine the perceptual
resolution.
Figure 2-9 – Spectral Envelope Encoding Block Diagram
2.3.3. Encode Bit Allocation
The goal of bit-rate reduction is to remove bits from a representation of an audio
signal in a manner that is inaudible to humans. The bit allocation block of the AC-3
coding standard uses the spectral envelope to determine the bit allocation required to
encode the individual mantissa. This block performs a convolution of the power spectral
density with the spreading function matching the ear’s masking curve. The bit allocation
block can be subdivided into smaller blocks as shown in Figure 2-10 below. [1]
Figure 2-10 – Encode Bit Allocation Block Diagram
Bit
Assignment
Transform
Coefficient
Exponents
Spectral Envelope Encoding
Power Spectrum
Mapping
Frequency Banding
Power
Spectral
Density
Encoded Spectral Envelope
Encode Bit Allocation
Spreading
Function
Hearing
Threshold
Masking
Comparison
Power
Spectral
Density
There are two broad classes of bit-allocation strategy. These are the forward
adaptive bit allocation and the backward adaptive bit allocation. AC-3 uses hybrid
forward/backward bit allocation, a fusion of these two methods. [1]
Forward Adaptive Bit Allocation
This is a method where the encoder may precisely calculate an optimum bit
allocation and explicitly code this allocation into the bit stream. It is the most accurate
allocation technique since the encoder has full knowledge of the input signal. Also, since
the psychoacoustic model is in the encoder itself, this can be changed without affecting
the installed encoders. The block diagram is shown in Figure 2-11.
However, there is a need to utilize a portion of the available bit-rate to deliver the
explicit bit allocation to the decoder. Transient conditions require fine time resolution
while steady state conditions require fine frequency resolution. These improved
resolutions require higher allocation in the data rate. Thus, forward adaptive puts
practical limitation on performance at very low bit rates.
Figure 2-11 – Forward Adaptive Bit Allocation Encoder Block Diagram
Backward Adaptive Bit Allocation
This is a method where the encoder calculates bit allocation from the coded audio
data without explicit information from the encoder. It does not require utilization of a
portion of the available bit-rate, thus all of the bits can be allocated to audio coding. It is
more efficient in transmission and allows the bit allocation to have superior time and
frequency resolution. The block diagram is shown in Figure 2-12.
However, the bit allocation must be calculated in the decoder from the
information contained in the bit stream, thus of limited accuracy and may contain small
errors. Also, psychoacoustic model may not be updated since the bit allocation algorithm
in the encoder is fixed.
Bit Allocator
Filter Bank Quantize
Multiplex
Input signal
Bit allocation
Coded bit stream
Figure 2-12 - Backward Adaptive Bit Allocation Encoder Block Diagram
Hybrid Forward/Backward Bit allocation
The Hybrid Forward/Backward Bit allocation is the method implemented by the
AC-3 encoding algorithm. It uses the core backward adaptive bit allocation process found
in both the encoder and the decoder. This core routine is accurate and based on the
psychoacoustic model which has assumptions on the masking properties of signals. The
block diagram is shown in Figure 2-13. The input to the core routine is a spectral
envelope which is part of the encoded audio. In AC-3, the spectral envelope has
time/frequency resolution equal to the analysis filter bank.
Figure 2-13 – Hybrid Forward/Backward Bit Allocation Encoder Block Diagram
2.3.4. Mantissa Quantization
The results of the parametric bit allocation model are used along with the raw
mantissa to perform the mantissa quantization. The mantissas are quantized with a
variable number of bits based on the results of the bit allocation block.
The resulting values in this quantization block are not just composed of a
specified number of significant bits. The mantissa values are scaled and offset to provide
zero-centered, equal-width and symmetrical quantization levels. [5]
Encode Spectral Envelope
Filter Bank Quantize
Core Bit Allocator
Input
signal
Bit allocator Multiplex
Bit Allocation
Side Information
Encode
Spectral
Envelope
Filter Bank Quantize
Multiplex
Input
signal
Bit allocator
Encoded Spectral
Envelope Coded
Bit-stream
Quantized
Samples
Coded
Bit-stream
2.3.5. Frame Formatting and Bit Stream Packing
The frame formatting and bit stream packing is the final stage in the AC-3
encoding algorithm. All information in each of the six blocks is placed in a series of array
and scalar values.
This information includes the bit allocation information, exponents, mantissas,
coupling coefficients and dither flags. [5] This data packing stage places all information
in a single block along with all the necessary data such as the synchronization, header,
and wrapper information. The data is coded logically for the decoder to easy unpack the
bit stream data.
2.4 AC-3 Decoding Algorithm
The AC-3 decoding algorithm is composed of five major blocks as shown in
Figure 2-14. The major blocks are basically the same as those in the encoding process
implemented inversely. These are the Frame De-Formatting and Bit Stream Unpacking,
Spectral Envelope Decoding, Decode Bit Allocation, Mantissa De-quantization, and the
Synthesis Filter Bank. The block diagram of the decoding algorithm is shown below:
Figure 2-14 - AC-3 Decoder Block Diagram
2.4.1. Frame De-Formatting and Bit Stream Unpacking
The AC-3 Frame De-Formatting block shown consists of frame synchronization,
error detection and frame de-formatting stages. The frame synchronization ensures that
the decoder maintains the sync with the incoming input data stream. [7] The encoded
digital audio input enters an input buffer before proceeding to the error detection stage.
Each of the decoder input data block is checked in the error detection stage for
consistency. In the case that an error is found and it cannot be resolved, the last known
correct block can be used as a substitute for the erroneous block. The substitution can be
Frame De-Formatting and Bit Stream
Unpacking
Mantissa De-Quantization Spectral Envelope Decoding
Encoded Spectral Envelope
Decode Bit Allocation
Encoded AC-3 bit stream
Synthesis Filter Bank
Mantissa
Quantized Mantissa
Exponents
PCM Time Samples
done several times until a consistent block is found, unless there are extended error
conditions which forces the decoder to mute or revert to analog. [5]
After the synchronization and error checking, the data can now be unpacked. This
process is simply a reverse of the frame formatting stage. All audio data information are
taken out to recover the bit information.
2.4.2. Spectral Envelope Decoding
The purpose of the spectral envelope decoding block is to recover the exponents
from the original encoded data input after it has undergone the frame de-formatting stage.
This will allow the data to be analyzed together with the de-quantized mantissa in the
synthesis filter bank to get the original PCM samples.
2.4.3. Decode Bit Allocation
The result of the decode bit allocation block together with the quantized mantissa
are analyzed to get the raw mantissa information. This block uses transmitted
intermediate result for less decoding time. It also allows modification of the derived bit
allocation for computation of each channel one at a time.
2.4.4. Mantissa De-Quantization
After the decoded bit allocation process, the mantissa is recovered from the
quantized state. This is done by using the specified information on the size of each
mantissa which can be taken from the decoded bit allocation output.
2.4.5. Synthesis Filter Bank using IMDCT
This is the final stage in the AC-3 decoding process. Before the synthesis filter
bank takes place, data is first converted to fixed point. This means that the mantissa and
exponent are combined to reconstruct the original frequency coefficients. Each of the
channel’s recovered transform coefficients are then inversed transformed back to time
domain using the Inverse Modified Discrete Cosine Transform (IMDCT), windowed and
overlap added to produce the original PCM time samples.
The Synthesis Filter Bank is represented by the following equations:
1
0
'( , ) ( , )cos[ (2 1)(2 1) (2 1) ]
4 4
M
k
X m n Y m k k n k
M
π π−
=
= + + + +∑ n = 0,1,… 2M-1 (3)
'( ) ( ) '( 1, ) ( ) '( , )x n h n M X m n M h n X m n= + − + + n = 0,1,… M-1 (4)
where:
x’(n) = reconstruction signal from the synthesis filter bank
h (n) = window function
2.5 Related Works
This project intends to implement the AC-3 audio coding standard by reviewing related
studies on filters and fast algorithms for efficient implementation. Comparison of filter
bank designs will be the main focus of this project.
2.6.3. The ATSC Standard: Digital Audio Compression (AC-3), Revision A
For technical guidelines on AC-3 encoding and specifications, the ATSC
Standard: Digital Audio Compression Revision A will be used. The AC-3 bit stream is
specified mainly by the syntax and decoder processing, thus the encoder is not precisely
defined. The only normative requirement on the encoding algorithm is that it is able to
produce a bit stream that follows the AC-3 syntax. ATSC documentation will be checked
for consistency of the encoder implementation.
2.6.4. FFmpeg Multimedia System
For the source code reference, the AC-3 encoder included in the codec library of the
FFmpeg project will be used. FFmpeg Multimedia System is a complete solution to
record, convert and stream audio and video. It is a trademark of Fabrice Bellard,
originator of the FFmpeg project and is an experimental and developer-driven project.
However, there are no formal releases yet.
The project is made of several components:
• ffmpeg – command line that converts video format to another
• ffserver – http multimedia streaming server for live broadcasts
• ffplay – simple media player
• libavformat – library containing parsers and generators for all common
audio/video formats
• libavcodec – the leading audio/video codec library that contains all
encoders/decoders. Most of codecs were developed from scratch to ensure quality
performances and high code reusability.
It is developed under Linux but it can be compiled under most operating system including
Windows.
Chapter 3
Methodology
The overview of the project is shown in Figure 3-1. For the first part, a reference
code was used which was modified to create a standard encoding algorithm. Different
designs were then implemented to come up with modified encoding algorithms. Finally,
quality performance of these encoder designs were then compared and analyzed using
different objective and perceptual tests to arrive at an optimum code which will be
selected among all of the implementations.
3.1 Design
The AC-3 encoder consists of the following information:
• AC3_encode_init – sets the correct parameters for the AC-3 encoder
• AC3_encode_close – outputs the correct AC-3 bit stream
• AC3_encode_frame – contains the actual AC-3 encoding processing
The main encoding process takes place in the AC3_encode_frame. It contains several sub
functions that perform the different encoder processing requirements which can be
summarized in Figure 3-2. These include:
• mdct512 – calculates the MDCT of the 512 input time samples
• compute_exp_strategy– along with its sub blocks, this computes the exponent
strategy that will be used by the encoder
• compute_bit_allocation – along with its sub blocks, calculate the number of
bits needed by the encoder using mantissa information
• output_frame_header, output_frame_end, output_audio_block – frame
packing and bit stream formatting
Figure 3-1 AC-3 Encode Frame
Since the project deals with comparing different filter bank designs, it is
necessary that the other parts of the encoder will not be modified so as not to affect its
performance. Therefore, the different designs directly refer to the filter bank
implementations. As mentioned previously, the AC-3 filter bank uses an oddly-stacked
TDAC. The general equation that characterizes TDAC is shown below:
( ) ( )( )
( )( )
0
0
cos ( ) 1 cos
2 2
sin ( ) 1 sin
2 2
( )
( )
cos( ) sin( )
2 2
k k
n
k
n
k
m mK
X m x n h P n w n n
m mK
x n h P n w n n
where K blocksize
x n time domain sequence
X m frequency data of band k in time block m
m m
and swit
π
π
π π
∞
=−∞
∞
=−∞
   
= − + − +   
   
   
+ − + − +   
   
=
=
=
=
∑
∑
( )( ) ( )( )0 0
0
cos sin
( ) & ( )
k k
ches deciding which path according to block number
w n n and w n n analysis and synthesis filters
n phase delay
h n f n windows with size P
+ + =
=
=
By setting wk = (2 ( 1/ 2) )k Kπ + , replacing –r=mK/2-n and performing deductions, we
can rewrite the equation as a general form of MDCT.
exponent_min
output_frame_header
encode_exp
compute_exp_strategy
compute_bit_allocation
mdct512
output_frame_end
output_audio_block
calc_exp_diff
bit_alloc_masking
bit_alloc compute_mantissa_size
AC3_encode_frame
The analysis filter bank is the processing of 512-time sample from time to
frequency domain where the current 256 input time samples are concatenated with the
256 time-sample of the previous set. The filter bank is completed in two major processes:
windowing and MDCT implementation.
The PCM samples are windowed through a normalization procedure by using a
Kaiser-Bessel window with 50% overlap between adjacent blocks. The Kaiser-Bessel
window can be described as:
0
0
[ , ]
[ ] , 0
2
[ , ]
n
p
KBD Left n
p
w p
N
W n n
w p
α
α
=
−
=
= ≤ <
∑
∑
[ ]
2
/ 4
1.0
/ 4 N
where [ , ] , 0 n
2
o
o
n N
I
N
w p
I
πα
α
πα
 −  −  
   = ≤ ≤
2
0
and
( / 2)
[ ]
!
k
o
k
x
I x
k
∞
=
 
=  
 
∑
For AC-3, α =4. The window characteristic described above is shown graphically
in Figure 3-2. The figure represents the transform window sequence whose values are
specified by the ATSC standard shown in Table 3-1.
Figure 3-2 AC-3 Window Sequence
Due to the window specification, the design modification now depends on the
type of MDCT implementation. This can be classified as (1) Direct (2) Indirect (3)
Recursive. Direct method, or traditional method, is the straightforward execution of the
MDCT block. The indirect method refers to the use of fast algorithms such as Fast
Fourier Transforms (FFT). The third method simply refers to algorithms that are
recursive in nature.
Table 3-1 AC-3 Transform Window Sequence (w[10A+B])
3.1.1 Direct Method
Given the time domain signal samples in a signal block ( )x n where n=0 to N-1
windowed by an N sample real Kaiser-Bessel derived window w(n) to obtain x(n) , the
complete MDCT of x(n) is:
1
0
2 2
( ) ( )cos[ (2 1)(2 1) (2 1)(1 )] k=0,1,2,...(N/2)-1
4 4
1 for the first short transform
0 for the long transform
1 for the second short transform
N
D
n
X k x n n k k
N N
π π
α
α
−
=
= − + + + + +
−

= 
+
∑
Removing the scale factor and setting α =0 simplifies the MDCT to:
1
0
(2 1 /2)(2 1)
( ) ( )cos k=0,1,2,...(N/2)-1
2
N
n
n N k
X k x n
N
π−
=
 
 
 
+ + +
= ∑
3.1.2 Indirect Method using DCT-IV
The direct or traditional method can be accomplished by K/2 points DCT-IV by
using a series of trigonometric identities:
If cos( + )=cos cos sin sin ,
cos (2 1) 2(2 1 ) 1 cos (2 1)(2 1)
4N 4
sin (2 1) 2(2 1 ) 1 sin (2 1)(2 1)
4N 4
X(m,n) input mth block signal consisting of 256
k N n k n
N
k N n k n
N
α β α β α β
π π
π π
   
     
   
   
     
   
−
+ − − + = − + +
+ − − + = + +
=
D
new input points and 256 old input points
Y(m,k) = output X ( )
h(n) = window function
x(n) = X(m,n)h(n)
The MDCT of x(n) can be rewritten as:
( ) ( ) ( , ) cos 2(2 1)(2 1) cos (2 1)
4 4D
k
X k h n X m n n k k
N
π π 
 
 
= + + +
1
0
1
0
sin 2(2 1)(2 1) sin (2 1) k=0,1,2,...(N/2)-1
4 4
( , ) cos (2 1) ( ) ( , )cos (2 1)(2 1)
4 2
-sin (2 1)
4
N
n
N
n
n k k
N
Y m k k h n X m n n k
N
k
π π
π π
π
−
=
−
=
      
      
      
    
    
    
 

 
− + + +
= + + +
+
∑
∑
1
0
/ 2 1
0
( ) ( , ) sin (2 1)(2 1) k=0,1,2,...(N/2)-1
2
( , ) cos (2 1) [ ( ) ( , ) (
4
N
n
N
n
h n X m n n k
N
Y m k k h n X m n h N
π
π
−
=
−
=
  
   
  
 
 
 
+ +
= + −
∑
∑
/ 2 1
0
1 ) ( , 1 )]cos (2 1)(2 1)
2
-sin (2 1) [ ( ) ( , ) ( 1 ) ( , 1 )] sin (2 1)(2 1) k=0,1,2,...(N/2)
4 2
N
n
n X m N n n k
N
k h n X m n h N n X m N n n k
N
π
π π−
=
  
  
  
    
    
    
− − − − + +
+ + − − − − + +∑ -1
The final analysis filter bank equation can be recognized as a DCT-IV whose
implementation structure is illustrated above. This implementation structure is illustrated
in Figure 3-3.
Figure 3-3 256-point DCT-IV Implementation Structure
0 256 0
1 DCT 1
: :
254 254
255 255
z-1
z-1
z-1
z-1
z-1
h(0)
z-1
z-1
z-1
h(1)
h(254)
h(255)
-1
-1
-1
-1
Co Y(m,0)
C1 Y(m,1)
C254 Y(m,254)
C255 Y(m,255)
x(n)
3.1.3 Indirect Method using FFT
Rewriting the direct implementation of the MDCT gives the form shown by the equation
below:
0
2 1 2 21( )
2 2
0
[ ] Re [ [ ] [ ] ]
n knNj n k j j
N N N
n
X k e x n w n e e
π π π−− + − −
=
  
=  
  
∑
This can be implemented using a series of steps: pre-twiddle, N-point FFT, post-twiddle.
The pre-twiddle step is the complex multiplication of the input samples by
2
2
n
j
N
e
π
−
. A 512-
point FFT is then performed on the pre-twiddled data. The post-twiddle step then takes
the real part of the transformed data and multipliying it by a factor
0
2 1
( )
2
j n k
N
e
π
− +
.
( ) ( )( ) ( ) ( )( )
2 ( 1/ 8) /
2 ( 1/ 8) /
( ' )
'
( 3 / 4) 0 k n/4-1
'( )
( / 4) n/4 k n-1
' 2 ' 2 1 ' / 2 2 ' / 2 2 1 , 0 k n/4-1
'
(2 ) Re( '( ))
(2 1) Im( '
m k
m m
k
i k n
k k
i m n
n
FFT f
F F e
f k n
f k
f k n
f f k f n k i f n k f n k
f f e
F
F m F m
F m F
π
π
+
+
=
=
− + ≤ ≤
=
− ≤ ≤
= − − − + + − − + ≤ ≤
=
=
+ =



/ 4 1
0 to n/4-1
)m
m
− −
=



3.1.4 Recursive Method
The direct method can also be implemented recursively. One traditional recursive method
is as follows:
Let ' ( / 4) and '' ( 3 / 4)n n N n n N= + = −
[ '] [ ' / 4] for n'=N/4,...N-1
[ ''] [ '' 3 / 4] for n''=0,...N/4-1
y n x n N
y n x n N
= −
= +
The direct implementation can then be written as,
/ 2 1
0
1
[ ] ( [ ] [ 1 ])
(2 1)
2cos
2
(2 1)( 1)
[ ] [ ]cos
N
n
u n y n y N n
n
N
n k
U k u n
N
π
π−
=
= − − −
+
+ +
= ∑
[ ] [ 1] [ ]X k U k U k= + +
This previous equations also represents the DCT-IV. By applying trigonometric
identities, we get
/ 4 / 4 1 / 4 2
/ 4 / 4 1 / 4 2
[ ] [ ] ( 1) [ / 2 1 ]
[ ] cos ( [ ] [ 1]) 2cos [ ] [ ]
2
[ ] sin ( [ ] [ 1]) 2cos [ ] [ ]
2
k
k
N k k k N N
k
N k k k N N
p n u n u N n
D k p j p j D k D k
S k p j p j S k S k
θ
θ
θ
θ
− −
− −
= + − − −
= − − + −
= − − + −
Finally, the output X[n] can be derived by using these equations:
/ 2
/ 4 1 / 4 1
( 1) / 2
/ 4 1 / 4 1
[ ] ( 1) ( [ ] [ 1])for k is even
[ ] ( 1) ( [ 1] [ ])for k is odd
k
n n
k
n n
X k D k S k
X k D k S k
− −
−
− −
= − + +
= − + +
Clenshaw’s Recurrence Formula
The proposed algorithms based on a traditional recursive implementation can be
formally obtained using Clenshaw’s recurrence formula which is an elegant and efficient
way of evaluating the sum of the coefficients.. The formula states that the sum of a given
function f(x) can be obtained using this equation:
1 1 2 2 1 3( ) ( ) ( 1, ) ( ) ( )N n N N N Nf x c F x N x F x y F x yβ− − − − − −= − − −
Applying it for the MDCT implementation, let
1
2
k k
M
π
θ
 
= + 
 
and define
2 1
1 2
0
( ) 2cosn k N N
a a
a x n a aθ
− −
− −
= =
= + −
( ) ( ) ( )1 2 2 1 3( ) ( 1) N k N k N N k NX k x N F F a F aθ θ θ− − − − −= − + −
Appplying mathematical transformation to the direct method finally gives the output X(k)
as:
1 2( ) cos ( 1) cos ( 1)
2 2
k k
N NX k M a M a
θ θ
− −
   
= − − + +   
   
The implementation structures of the traditional recursive transform and the recursive
solution using Clenshaw’s recurrence formula are shown in Figures 3-4(a) and 3-4(b)
respectively.
(a)
(b)
Figure 3-4 (a) Implementation structure for (a) traditional recursive method (b)
recursive based on Clenshaw’s recurrence formula
3.2 Implementation
FFmpeg Multimedia System was the main reference used for implementing the
designs using the C programming language. The code modification was done in an IDE
environment using Microsoft Visual Studio 2005.
However, the reference can only be compiled in Windows using selected software
that can complete the actual compiling and producing of the win32 binaries. Thus, the
compilation was done in a command line Linux environment using the MingGW and
Minimal SYStem (MSys) system with an integrated gcc compiler. MSys is a branch of
Cygwin used in Windows OS for creating configuration and packages. It also offers user
environment for MinGW development and allows users of MinGW version of GCC to
port and build packages in a GNU familiar way without the UNIX complexities brought
about by Cygwin.
The implementation procedure involves a direct application of the different
methods for the MDCT processing block.
3.3 Testing
There were initial tests done to evaluate the performance of the designs. A
preliminary listening test was already implemented to determine the test samples to be
used for the objective test. A formal listening test will be done during the last phase of the
project.
Figure 3-5 Flowchart of the Testing Procedure
3.3.1 Preliminary Test
The main objective of the preliminary test was to determine the type of audio
samples that are best suited for the objective tests. Different types of audio vary in
frequency contents. Thus to produce rational results for the objective analysis, worst case
scenario must be considered. This scenario implies that the encoded audio sample should
be far from the original PCM audio. If the difference between the quality of the encoded
audio sample and the original audio is almost unnoticeable, we can immediately expect
that the results will be almost identical.
The International Telecommunication Union [ITU-R BS.1284] provides the
appropriate guidelines and scale to be used for quality assessment. In determining the
relative difference of two samples with quality far from each other, the seven-grade
comparison scale shown in the figure below is recommended.
Grade Comparison
3 Much Better
2 Better
1 Slightly Better
0 The Same
-1 Slightly Worse
-2 Worse
-3 Much Worse
The system used was the Multiple Stimulus with Hidden Reference and Anchors
(MUSHRA) method. It is a double blind multiple stimulus test method with hidden
Preliminary
Test
Objective Tests Formal
Listening Test
Traditional
and Modified Designs
Listening and
Objective Test Results
reference. This setup involves a reference audio that can be compared against multiple
test samples, wherein one of the test samples is the actual same copy of the reference
sample unknown to the listener. Ideally, the user must be able to give a grade of zero
since one of the test samples is its duplicate.
For this project, the reference audio used was the original PCM audio. One of the
test samples was a copy of the original and the rest of the test samples were the different
designs used for implementing the AC-3 encoding algorithm.
The preliminary listening test was then performed using three different PCM
audio (plain speech, a pop music, and an instrumental song) using randomly selected AC-
3 encoder designs (direct method, MDCT using DCT-IV and MDCT using FFT).
In addition to this, preliminary objective test was done to assess the frequency
response of the selected audio.
3.3.2 Objective Tests
After getting the results of the preliminary listening test, the audio sample was
selected and the objective tests were performed using MATLAB 7.0. However, since
MATLAB is unable to directly read AC-3 formats, the encoded audio sounds were first
recorded to raw PCM format that MATLAB recognizes.
The software used was Audacity, an open source program capable of recording
sounds without the need for microphone and speaker set-up. This ensures that there is no
degradation in audio quality and the characteristics of the encoded audio format can truly
be analyzed.
Moreover, stereo separation into left and right channels was done in order to study
the correlation and frequency response of the encoded AC-3 audio. The autocorrelation
of the original PCM audio was plotted against the correlation between the original audio
and each design. This will illustrate whether or not the encoded AC-3 audio and the
original PCM are highly correlated.
The frequency response of each design was then determined to verify if there are
noticeable differences compared to that of the original PCM audio. The flowchart below
summarizes the procedure done for the objective tests.
Figure 3-6 Flowchart of the Objective Test Procedure
3.3.3 Formal Listening Test
Formal listening test is considered the ultimate measurement of audio quality. The
test was done in a similar manner to that in the preliminary listening tests.
However, the scale used for the formal listening test is the five-grade impairment
scale. This is defined by the [ITU-R BS.562-3] as the scale used for systems with small
impairments. This is related to the five-grade quality scale as shown below. The test
method used is still a double blind multiple stimulus test method with hidden reference.
Grade ITU-R Grade
5 Imperceptible
4 Perceptible, but annoying
3 Slightly annoying
2 Annoying
1 Very annoying
3.3.4 Playback
The open source encoder AC3Filter developed by Alexander Vigovsky on August
2006 [9] will serve as the AC-3 main decoder. AC3Filter is a high quality free
DirectShow AC-3 filter that allows decoding of videos with AC3-encoded surround
audio.
The source is composed of the filter and an audio processing library. The filter
implements all interfaces and logic required by DirectShow while the audio processing
Objective Analysis
- frequency response and correlation
Objective Test Results
Channel Separation
- to left and right channels in MATLAB
AC-3 recording
- to PDCM audio using Audacity
Traditional and Modified Designs
library implements all audio decoding and processing algorithms. It supports
configuration settings that allow control of audio sound intensity and can decode AC-
3/DTS/MPEG1/2 Audio Layer I/II formats. Some of the features include decomposition
to six channels, full information about the audio track format, per channel amplification
for all input/output channels and per channel delays.
3.3.5 Listening Conditions
The listening tests were performed with the listeners having direct control on the
listening conditions. The direct control includes being allowed to repeat the audio
samples and change the audio volume level for. They were asked to use headphones
throughout the duration of testing.
Chapter 4
Results and Analysis
4.1 Objective Measurements
The preliminary objective test done showed that the different encoder designs
produced an encoded audio with frequency characteristics similar to that of the original
PCM input audio.
The test was done on three different audio inputs namely pop music, pure speech, and
an instrumental song. Initially, three encoder designs were considered. These are the
direct implementation, and two modified encoder designs. The first modified design is
the filter bank implementation based on DCT-IV and the other one is based on FFT. The
frequency responses of the designs show minimal differences. The low frequency
components of the encoded audio show unnoticeable changes from the original input
PCM but some high frequency artifacts were present after the encoding process. The
encoder design with an MDCT implementation using FFT showed more visible artifacts
than the other designs. The frequency response of the original instrumental audio PCM
input (electric piano) is plotted in Figure 4-1 followed by those of the modified designs in
Figure 4-2.
(a) (b)
Figure 4-1 Frequency Response of the Original Input
for (a) Left (b) Right Channels
(c) (d)
(e) (f)
(g) (h)
Figure 4-2 Frequency Response of the Different Designs using:
(c) - (d) Direct Method , (e) –(f) Indirect Method DCT-IV ,
(g) - (h) Indirect Method FFT for both left and right channels
The crosscorrelation of the original input PCM and the encoded audio for each design
was also determined and is illustrated in Figure 4-3.
(a) (b)
Figure 4-3 Autocorrelation of the Original Input
for (a) Left (b) Right Channels
(c) (d)
(e) (f)
(g) (h)
Figure 4-4 Correlation of the Different Designs and Original Input PCM
using: (c) - (d) Direct Method , (e) –(f) Indirect Method DCT-IV ,
(g) - (h) Indirect Method FFT for both left and right channels
The results above show that there is a high correlation between the encoded audio and
the original input PCM. Though frequency response and correlation present a general
insight on the frequency components present in an audio sample and its correlation with
the original input, it shows little or no information on the quality of the encoded audio
aside from the noticeable appearance of frequency artifacts.
4.2 Audio Test Samples
The preliminary listening test done reveal that the listener was not be able to
recognize the impairments among the different designs immediately for ordinary pop
music, but it became more obvious when the input audio sample used contains high
frequency components, and thus, high pitch. This is in line with the result of the
frequency response mentioned earlier when frequency artifacts appear after encoding.
It is also more noticeable when the audio sample is made up of a single instrument or
speech. The ensemble of musical instruments masks the impairments brought about by
the encoder design. Table 4-1 shows the average grade given by 10 listeners.
Test Sample Encoder Design Arithmetic Mean
Pop Music Original -0.3333
Direct 0
Indirect using DCT 0.6667
Indirect using FFT 1.125
Speech Original 0.3333
Direct -1
Indirect using DCT -1.1111
Indirect using FFT -1.5556
Table 4-1 Result for the preliminary listening tests
Having realized this, the selections of critical material used are listed in Table 4-2.
The test samples are composed primarily of different musical instruments expect for
speech and pop music.
Input Test Samples
1. CLARINET
2. DRUMSET
3. GUITAR_CONCERT
4. GUITAR_ELECTRIC
5. PIANO_CONCERT
6. PIANO_ELECTRIC
7. POP MUSIC
8. SAXOPHONE
9. SPEECH
Table 4-2 List of critical materials to be used
Chapter 5
Project Schedule
5.1 Revised Gantt Chart
Week no.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
A
B
C
D
E
F
G
H
Tasks
A Implementation and Testing of the analysis filter bank
B Implementation and testing other encoder blocks
C Implementation and Testing of the 1st
and 2nd
modified filter bank designs
D Implementation and Testing of the 3rd
and 4th
modified filter bank designs
E Listening tests and comparative analysis of the designs
F Testing of other optional AC-3 compression features
G Final Comparative analysis and Listening Tests
H Documentation
5.2 Halfway-Point Deliverables
The standard integrated code, test results and documentation for each of the following
individual blocks in the encoder:
• Implementation of four different filter bank designs
• Objective test results
• Preliminary listening test results
5.3 Final Deliverables
Task
The final deliverables include four filter bank designs and analysis of AC-3 audio
compression feature. The comparative analysis of four different filter bank
implementations before the conclusion of the optimum final code will also be presented,
stating grounds for selecting the design. All functional and listening test results will also
be presented along with the full documentation of the project implementation.
Bibliography
[1] C. Liu and W. Chang. “Audio Coding Standards”. National Chiao Tung
University, Taiwan.
[2] Digital Audio Formats. http://www.teamcombooks.com/mp3handbook/12.htm
Feb 10, 2007
[3] Dolby AC-3 Audio Coding. http://www.fedele.com/website/hdtv/dolbyac3.htm
Feb 10, 2007
[4] Dolby AC-3. http://www.mp3-tech.org.htm. Feb 10, 2007
[5] M. Davis. “The AC-3 Multichannel Coder”. Dolby Laboratories Inc., San
Francisco, 1993.
[6] C. Todd, G. Davidson, M. Davis, L. Fielder, B. Link, and S. Vernon. “AC-3:
Flexible Perceptual Coding for Audio Transmission and Storage”. Dolby
Laboratories Inc., San Francisco, 1994.
[7] A. Alvarez, S. George, H. Yang, and K. Das. “SGS-Thomson DSP D950-Core
Audio Compression”. DSP R&D Centre SGS-Thomson Microelectronics”.
[8] Y. Chen, C. Tsai, and J. Wu. “Fast Time-Frequency Transform Algorithms and
their Applications to Real-Time Software Implementation of AC-3 Audio Codec”.
National Taiwan University.
[9] AC3Filter Project. http://ac3filter.net/ Feb 10, 2007
[10] AC3Decode. http://www.users.on.net/%7Ersobon/ac3dec.html Feb 10, 2007
[11] Z. Zhao. “Efficient Implementation Structures of AC-3’s Filter Banks.” Hangzhou
Institute of Electronic Engineering, Hangzhou China.
[12] S. Lee and C. Liu. “Transformation from 512-point transform coefficients to 256-
point transform coefficients for Dolby AC-3 decoder”

More Related Content

What's hot

Professional Skills Highlights
Professional Skills HighlightsProfessional Skills Highlights
Professional Skills HighlightsVideoguy
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG EncoderIJERA Editor
 
Raspberry pi glossary of terms dictionary extended
Raspberry pi glossary of terms dictionary extendedRaspberry pi glossary of terms dictionary extended
Raspberry pi glossary of terms dictionary extendedWiseNaeem
 
Wonho Park_20151209
Wonho Park_20151209Wonho Park_20151209
Wonho Park_20151209Wonho Park
 
Modern SATCOM Forward Error Correction
Modern SATCOM Forward Error CorrectionModern SATCOM Forward Error Correction
Modern SATCOM Forward Error CorrectionGregory Schoenig
 
Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1kavya R
 

What's hot (10)

Professional Skills Highlights
Professional Skills HighlightsProfessional Skills Highlights
Professional Skills Highlights
 
Gg2612711275
Gg2612711275Gg2612711275
Gg2612711275
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG Encoder
 
Raspberry pi glossary of terms dictionary extended
Raspberry pi glossary of terms dictionary extendedRaspberry pi glossary of terms dictionary extended
Raspberry pi glossary of terms dictionary extended
 
Wonho Park_20151209
Wonho Park_20151209Wonho Park_20151209
Wonho Park_20151209
 
Modern SATCOM Forward Error Correction
Modern SATCOM Forward Error CorrectionModern SATCOM Forward Error Correction
Modern SATCOM Forward Error Correction
 
Performance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression TechniquesPerformance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression Techniques
 
proposal
proposalproposal
proposal
 
I010435659
I010435659I010435659
I010435659
 
Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1Be cps-18 cps13or23-module1
Be cps-18 cps13or23-module1
 

Similar to AC-3 Encoders

10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612NITC
 
Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2engLamaMahmoud
 
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...csandit
 
Applications - embedded systems
Applications - embedded systemsApplications - embedded systems
Applications - embedded systemsDr.YNM
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)Jonah McLeod
 
Performance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksPerformance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksAlexander Decker
 
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...Alexander Decker
 
Implementation of Low Bit Rate Vocoder for Speech Compression
Implementation of Low Bit Rate Vocoder for Speech CompressionImplementation of Low Bit Rate Vocoder for Speech Compression
Implementation of Low Bit Rate Vocoder for Speech CompressionIRJET Journal
 
Low power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoderLow power fpga solution for dab audio decoder
Low power fpga solution for dab audio decodereSAT Publishing House
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data CompressionPratik Pradhan
 
Copy of colloquium 3 latest
Copy of  colloquium 3 latestCopy of  colloquium 3 latest
Copy of colloquium 3 latestshaik fairooz
 
J03502050055
J03502050055J03502050055
J03502050055theijes
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148IJRAT
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 

Similar to AC-3 Encoders (20)

10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612
 
Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2Video compressiontechniques&standards lamamahmoud_report#2
Video compressiontechniques&standards lamamahmoud_report#2
 
H263.ppt
H263.pptH263.ppt
H263.ppt
 
A04840107
A04840107A04840107
A04840107
 
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
 
Ai Brt
Ai BrtAi Brt
Ai Brt
 
Applications - embedded systems
Applications - embedded systemsApplications - embedded systems
Applications - embedded systems
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
 
Performance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networksPerformance evaluation of mpeg 4 video transmission over ip-networks
Performance evaluation of mpeg 4 video transmission over ip-networks
 
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
11.performance evaluation of mpeg 0004www.iiste.org call for-paper video tran...
 
Implementation of Low Bit Rate Vocoder for Speech Compression
Implementation of Low Bit Rate Vocoder for Speech CompressionImplementation of Low Bit Rate Vocoder for Speech Compression
Implementation of Low Bit Rate Vocoder for Speech Compression
 
F5242832
F5242832F5242832
F5242832
 
Low power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoderLow power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoder
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data Compression
 
Copy of colloquium 3 latest
Copy of  colloquium 3 latestCopy of  colloquium 3 latest
Copy of colloquium 3 latest
 
J03502050055
J03502050055J03502050055
J03502050055
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
M.sc.iii sem digital image processing unit v
M.sc.iii sem digital image processing unit vM.sc.iii sem digital image processing unit v
M.sc.iii sem digital image processing unit v
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

AC-3 Encoders

  • 1. Comparative Study of Analysis Filter Bank Designs for AC-3 Encoding Undergraduate Student Project Mark Joseph L. Tan 2002-22034 B.S. Computer Engineering Adviser Franz A. de Leon, MS EE University of the Philippines, Diliman
  • 2. Approval Sheet In partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Engineering, this project entitled “Comparative Study of Analysis Filter Bank Designs for AC-3 Encoding” prepared and submitted by Mark Joseph L. Tan, is hereby recommended for approval. __________________ Franz de Leon Adviser Accepted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Engineering Approved by: __________________ Panel __________________ Panel __________________ Reader __________________ EEE190 Handler
  • 3. Abstract Several high-fidelity digital audio formats have been developed with the challenge of developing audio coding techniques intended for low cost, limited bandwidth and storage capacity. AC-3 is one of these international standards. It has been widely used as the audio coding standard for laserdiscs, cinemas and DVD films. AC-3 has been accepted by the Advanced Television Systems Committee (ATSC) as the audio coding standard for US Grand Alliance High Definition Television (HDTV) system. This project is a comparative study of analysis filter bank designs for AC-3 encoding. The standard algorithm will first be implemented in stereo mode using software. Different optimization techniques on the analysis filter bank designs will then be performed. Finally, listening and speed performance tests will be done to compare the optimized codes with the standard implementation. The project will aid in future designs concerning the AC-3 audio coding standard, with the goal of improving performance and efficiency for high-quality digital audio compression.
  • 4. Table of Contents 1. Introduction 1.1. Background of the Project 1.2. Project Objectives 1.3. Overview 2. Literature Review 2.1. Standard Digital Audio Format 2.1.1. Standard Formatss 2.1.2. MPEG-Based Proprietary Formats 2.1.3. Non-MPEG Proprietary Formats 2.2. AC-3 Audio Coding Standard 2.2.1. History and Development 2.2.2. AC-3 Compression Features 2.2.3. AC-3 Comparative Advantages 2.2.4. AC-3 Bit Stream 2.3. AC-3 Encoding Algorithm 2.3.1. Analysis Filter Bank 2.3.2. Spectral Envelope Encoding 2.3.3. Encode Bit Allocation 2.3.4. Mantissa Quantization 2.3.5. Frame Formatting and Bit Stream Packing 2.4. AC-3 Decoding Algorithm 2.4.1. Frame De-Formatting and Bit Stream Unpacking 2.4.2. Spectral Envelope Decoding 2.4.3. Decode Bit Allocation 2.4.4. Mantissa De-Quantization 2.4.5. Synthesis Filter Bank using IMDCT 2.5 Related Works 2.5.1 The ATSC Standard 2.5.2 FFmpeg Multimedia System 3 Methodology 3.1 Design 3.1.1 Direct Method 3.1.2 Indirect Method using DCT-IV 3.1.3 Indirect Method using FFT 3.1.4 Recursive Method 3.2 Implementation 3.3 Testing 3.3.1 Preliminary Test 3.3.2 Objective Tests 3.3.3 Formal Listening Test 3.3.4 Playback 3.3.5 Listening Conditions 4 Results and Analysis 4.1 Objective Measurements
  • 5. Chapter 1 Introduction Digital Audio Compression (AC-3) is the 3rd generation audio coding algorithm developed by the Digital coder group at Dolby Laboratories which eventually became the audio coding standard for the US Grand Alliance HDTV system. The primary motivation in developing the AC-3 audio coding standard was the need for a superior multi-channel audio coder. AC-3 became one of the most successful proprietary digital audio formats available in the market. It is usually integrated with video applications and is rarely used as stand alone. Thus, even though it has been around since 1990’s, it is not as prominent as the other consumer audio formats such as MPEG Layer III (mp3). AC-3 has a wide variety of applications. It can found in laserdiscs, digital cable and even satellite transmissions. But it is famous for being the audio format used in cinemas and DVD’s. Like other popular codecs, it has a perceptual coding algorithm that uses minimal digital information and lower data rates which allows minimum degradation of quality. Thus, the sound almost exactly the same as the original sound when decoded for playback providing true surround. Currently, there are several audio decoders that are capable of AC-3 playback. However, it is very rare to find audio encoders and decoders that are solely dedicated to handling the AC-3 format. 1.1 Background of the Project Figure 1-1 Project Overview The project is a comparative study of the analysis filter bank designs for the AC-3 encoding which will be subdivided into two major parts. The first part would be the software development of the AC-3 audio coding standard’s encoding algorithm while the second part would be the implementation of analysis filter bank designs that will modify the traditional algorithm. The project overview is shown in Figure 1-1. Traditional Encoding Algorithm Modified Encoding Algorithms Final Code Comparative Analysis Objective Analysis
  • 6. For the development of AC-3 encoding algorithm, each of the different coding blocks of this audio format will be identified, implemented and tested. Modification will then be carried out in order decrease the amount of computations, thus, minimizing the complexity. This will be verified by testing procedures which will include listening tests to assess the quality of the audio file and tests to verify computational complexity. The implementation of the encoding algorithm as well as the optimization techniques will be based on various researches that deal with the AC-3 audio coding standard. 1.2 Project Objectives One of the major factors considered in creating perceptual audio coders is the need for real-time processing. The speed of the encoding algorithm is directly related to the computational complexity and the number of cycles it takes to process an audio block. Since a large part of the calculation occurs in the filter bank, the interest of the project lies on having a comparative analysis of different filter bank designs for the AC-3 encoding algorithm. This will be done by implementing each block of the coding process to create an encoder that operates in stereo mode and applying different methods for optimizing the performance. The project aims to optimize a traditional AC-3 encoding algorithm to reduce its computational complexity. Lastly, it is of great importance to characterize AC-3 compression features since this will be the first AC-3 related project to be conducted by DSP Laboratory. The project aims to look into particular AC-3 features that make it one of the most powerful perceptual audio standards. 1.3 Overview The components of the AC-3 encoding algorithm can be grouped into five major blocks. These are the: (1) analysis filter banks which perform the time-frequency transformation of the input signal, (2) spectral envelope encoding where the power spectral density is analyzed, (3)-(4) bit allocation and mantissa quantization that use psychoacoustic models to manipulate the mantissa component of the input signal and (5) formatting & bit stream packing that prepares the audio data for encoding. [1] Several of the AC-3 related studies that have been carried out focused on implementing the encoding algorithm and improving its performance such as those in [8] [11] [12]. The emphasis of the optimization methods have been put mainly on the analysis filter bank which aims to minimize complexity by decreasing the number of computations needed by the encoding algorithm. This is because it is one of the most calculation intensive processes in the algorithm. Thus, this project will compare the optimization methods on the analysis filter bank design.
  • 7. The project will be an implementation of the AC-3 audio coding standard’s encoding algorithm using software. The entire standard algorithm along with the analysis filter bank designs will be implemented using the C/C++ programming language. Testing will be done using two methods, first would be objective tests that will measure accuracy and speed to check the number of computations made. Second would be subjective audio listening tests to examine quality and check if there are noticeable differences on the generated AC-3 audio format.
  • 8. Chapter 2 Literature Review 2.1 Standard Digital Audio Formats The introduction of the audio compact disc (CD) technology in 1982 set off the transformation of audio media from analog to digital format. The single purpose of CD back then was to hold music in digital form. However, huge amount of data is required to represent the digital audio file which made it necessary to develop audio coding techniques for efficient transmission and storage. [1] The goal of these coding techniques is to efficiently use the least amount of bits to represent a digital audio file without making any obvious changes in the original data through transparent coding. Audio coding technology has experienced tremendous development that it has become possible to successfully remove as much as 90% of audio data during compression without noticeable changes in the original from. There are several international and commercially available audio coding standards which can loosely be classified as Standard Formats, MPEG-Based Proprietary Formats, and Non-MPEG Proprietary Formats. Table 2-1 shows some examples of each. [2] Standard Formats MPEG-Based Proprietary Formats Non-MPEG Proprietary Formats Pulse Code Modulation (PCM) AT&T’s a2b AC-3 Differential PCM MP4 EPAC Adaptive Differential PCM Liquid Audio Windows Media Audio (WMA) u-law Compression Apple QuickTime Real Audio ISO/MPEG Family Transparent Audio Compression (TAC) TwinVQ (VQF) Table 2-1 – Audio Coding Standards 2.1.1. Standard Formats PCM is a type of format that can be read by several audio applications. DPCM, APCM and u-law compressions are simple variations of lossy compression formats that store only the difference between consecutive samples instead of the entire information. [2] The Moving Pictures Experts Groups (MPEG) under the International Organization for Standardization (ISO) has developed several standards under the ISO/MPEG Family. These include the MPEG-1 Layers I, II, and III - which can operate in mono or stereo modes; MPEG-2 Layers I, II, III, and AAC - which can operate in multi-channel; and MPEG-4 – which can operate in multi-channels low-bit rates.
  • 9. 2.1.2. MPEG-Based Proprietary Formats The ISO/MPEG Family consists of notable open standards which have become widely used. Because of this, several proprietary formats have been developed which are MPEG-based. These enhancements in the ISO/MPEG standards paved way for the development of sophisticated audio formats that support several features. [2] 2.1.3. Non-MPEG Proprietary Formats There are also several Non-MPEG Proprietary Formats available in the market. These formats offer a wide array of applications and claim to produce quality audio sound that is indistinguishable from the original CD. Among all of these, AC-3 has emerged to be the de facto standard found on the latest generation of laserdiscs, cinemas, and DVD films because of its audio coding technique that makes it capable of delivering high-fidelity audio content. [2] 2.2 AC-3 Audio Coding Standard 2.2.1. History and Development In 1987, the Federal Communications Commission (FCC) formed the Advisory Committee on Advanced Television Service (ACATS) in the United States.[6] ACATS formally started the High Definition Television (HDTV) standardization process with the aim of creating a an excellent multi-channel audio format for HDTV sound. Dolby AC-1, a low-cost delta-modulation based coding technology was developed. By 1989, advances in the audio coding technology allowed the development of AC-2, a higher quality audio coding standard with reduced bit-rate. [6] A real-time implementation of AC-3 was first conducted in multiple DSP cards. By 1991, Star Trek VI became the first AC-3 encoded film. This multi-channel audio compression standard was accepted by ATSC in 1994. 2.2.2. AC-3 Compression Features One of the key features of the AC-3 audio standard is the use some very sophisticated psychoacoustic models which allow channels to be combined and separated. By using a global bit allocation technique, it allows for any channel to borrow bits which are not being used. It has been designed to take maximum advantage of human auditory masking in that it divides the audio spectrum of each channel into narrow frequency bands of different sizes optimized with respect to the frequency selectivity of human hearing. This makes it possible to sharply filter coding noise so that it is forced to stay very close in frequency to the frequency components of the audio signal being coded. By reducing or
  • 10. eliminating coding noise wherever there are no audio signals to mask it, the sound quality of the original signal can be subjectively preserved. [3] AC-3 provides only full range channels so the sounds are much better in terms of quality and allows monophonic up to 5.1 audio channels to provide the necessary channels needed for a digital film audio system. The configuration for a 5.1 channel setup is composed of five full bandwidth channels: Left, Right, Left Surround, Right Surround, Center, and the Subwoofer is shown in Figure 2-1. Details of the AC-3 audio coding standard are also shown in the table below. [4] Figure 2-1 – AC-3 5.1 channel configuration General Characteristics of AC-3 Audio Standard Sample frequency 32 kHz, 44.1kHz, 48 kHz Bit Depth 16 – 24 bits Bit Rate (B) 32 – 640 kbps Frame Size 1536 samples Compression 10:1 DVD Characteristics of AC-3 Audio Standard Maximum Sampling Frequency 192 kHz Maximum Samples per second 192,000 Maximum Bit Depth 24-bits Table 2-2 – AC-3 Characteristics 2.2.3. AC-3 Comparative Advantages Evaluating audio compression is complicated because there is no objective quality parameter. Instead, comparison of AC-3 with two other popular digital audio formats, CD Quality (PCM) and MPEG Layer III (MP3) is shown in the Table 2-3. RIGHT SURROUND LEFT SURROUND RIGHTLEFT CENTER (SUBWOOFER)
  • 11. CD Quality MPEG Layer III* AC-3* Precision Sampling Accuracy (Bits/sample/channel) 16-bit 16-bit 24-bit Gradiation Levels 65,536 65,536 16,777,216 Rate Sampling Rate ( Freq ) 44.1kHz 44.1 kHz 96 kHz Samples per second 44,100 44,100 96,000 Bit Rate in bits/sec 1,411,200 128,000 192,000 Bit Rate in Kbytes/sec 176 16 24 Size MB/min 10 1 1.5 Compression 1:1 1:12 1:10 * Most commonly used configuration Table 2-3 - Compares Attributes of AC-3 versus CD audio and MPEG Layer III Precision values clearly show that AC-3 has a more accurate signal representation in Figure 2-2. The 24-bit depth allows it to have 256 times more gradiation levels than the CD quality. Despite of this, AC-3 can still compress audio data by a factor of 10 due to its very high compression technique. Although this is not evident in stereo mode (since AC-3 is designed to be most advantageous in multi-channel configurations), its compression factor is still comparable to MPEG Layer III. Figure 2-2 – Graphical Comparison of CD Audio with AC-3 2.2.4. AC-3 Bit Stream An AC-3 Bit Stream is composed of a series of frames representing a constant time interval of 1536 PCM samples across all coded channels. Each frame is an independent entity and no data are shared with previous frames aside from the inherent transform overlap. The frame has a fixed size and depends only on the sample rate and coded data rate. It consists of 6 audio blocks with 256 samples each. The structures of the AC-3 Frame and Audio Block are shown in Tables 2-4 and Figure 2-5.
  • 12. SYNC CRC1 SI BSI AB0 AB1 AB2 AB3 AB4 AB5 AUX CRC2 SYNC It maintains synchronization with the encoder / decoder. CRC1 and CRC2 These are means of error detection. SI and BSI The Sync Information (SI) and Bit Stream Information (BSI) include sample rate, data rate, and number of coded channels. AB0-AB5 These are the 6 audio blocks, each representing 256 PCM samples per coded channel. AUX optional aux data for embedding private control or status information Table 2-4 – AC-3 Frame Structure Block Switch Flags Dither Flags Dynamic Range Control Coupling Strategy Coupling Coordinates Exponent Strategy Exponents Bit Allocation Parameters Mantissa Table 2-5 – AC-3 Audio Block Structure 2.3 AC-3 Encoding Algorithm The AC-3 encoding algorithm is composed of five major blocks as shown in Figure 2-7. These are the Analysis Filter Bank, Spectral Envelope Encoding, Encode Bit Allocation, Mantissa Quantization, and Frame Formatting and Bitstream Packing. The block diagram below shows the AC-3 encoding process. Figure 2-7 – AC-3 Encoder Block Diagram 2.3.1. Analysis Filter Bank This block is subdivided into two smaller blocks which are the frequency domain transformation and the block floating points as shown in Figure 2-8. Analysis Filter Bank Mantissa Quantization Spectral Envelope Encoding Exponents Encode Bit Allocation PCM time samples Frame Formatting and Bitstream Packing Quantized Mantissa Mantissa Encoded Spectral Envelope Encoded AC-3 bit stream
  • 13. Figure 2-8 – Analysis Filter Bank Block Diagram Frequency Domain Transformation using MDCT The PCM time samples are best examined in the frequency domain due to the frequency masking properties of human hearing. To transform the signal from a sequence of PCM time samples to a sequence of frequency coefficient blocks, a 512-point Modified Discrete Cosine Transform (MDCT) with 50% overlap is first performed. The result is then multiplied by a time window to transform the input signal to frequency domain. This MDCT is based on the idea of Time Domain Alias Cancellation (TDAC) introduced by Princen and Bradley. [11] Normally, the analysis filter bank uses the 512-point MDCT (long transform). But in the event of transient signals, this method can be improved by using two 256-point transforms (short transforms) instead. The 512 x 256 matrix that transforms the 512 windowing samples results in 256 frequency coefficients. The equation of this matrix is shown below: cos( (2 1)(2 1_ (2 1)) 1024 4 512 256 nkL n k k where L x matrix nk each entry of L π π+ + − < > = + + + + = = The 256 x 128 matrices that transforms the 512 windowing samples results in two sets of 128 frequency coefficients. The equations of these matrices are shown below: 1 1 1 cos( (2 1)(2 1)) 512 256 128 nkS n k where S x matrix nk each entry of S π+ + + < > = + + = = 2 2 2 cos( (2 1)(2 1) (2 1)) 512 2 256 128 nkS n k k where S x matrix nk each entry of S π π+ + + < > = + + + + = = PCM time samples Analysis Filter Bank Frequency Domain Transformation Block Floating-PointMantissa Exponent
  • 14. Block Floating-Point The input to this block is the set of all transform coefficients that have been computed in the frequency domain transformation block. The floating point block simply breaks down each of the frequency transform coefficients into binary exponent and mantissa parts which will be analyzed separately in the succeeding blocks. [1] 2.3.2. Spectral Envelope Encoding The Spectral Envelope Encoding involves two main processes, the power spectral mapping and frequency banding. The exponents are first encoded according to time and frequency resolution where they are mapped to the power spectrum so that the computation of this spectrum can be simplified. The frequency banding then maps the MDCT frequency domain to Fourier Frequency domain where the psychoacoustic model is used to determine the perceptual resolution. Figure 2-9 – Spectral Envelope Encoding Block Diagram 2.3.3. Encode Bit Allocation The goal of bit-rate reduction is to remove bits from a representation of an audio signal in a manner that is inaudible to humans. The bit allocation block of the AC-3 coding standard uses the spectral envelope to determine the bit allocation required to encode the individual mantissa. This block performs a convolution of the power spectral density with the spreading function matching the ear’s masking curve. The bit allocation block can be subdivided into smaller blocks as shown in Figure 2-10 below. [1] Figure 2-10 – Encode Bit Allocation Block Diagram Bit Assignment Transform Coefficient Exponents Spectral Envelope Encoding Power Spectrum Mapping Frequency Banding Power Spectral Density Encoded Spectral Envelope Encode Bit Allocation Spreading Function Hearing Threshold Masking Comparison Power Spectral Density
  • 15. There are two broad classes of bit-allocation strategy. These are the forward adaptive bit allocation and the backward adaptive bit allocation. AC-3 uses hybrid forward/backward bit allocation, a fusion of these two methods. [1] Forward Adaptive Bit Allocation This is a method where the encoder may precisely calculate an optimum bit allocation and explicitly code this allocation into the bit stream. It is the most accurate allocation technique since the encoder has full knowledge of the input signal. Also, since the psychoacoustic model is in the encoder itself, this can be changed without affecting the installed encoders. The block diagram is shown in Figure 2-11. However, there is a need to utilize a portion of the available bit-rate to deliver the explicit bit allocation to the decoder. Transient conditions require fine time resolution while steady state conditions require fine frequency resolution. These improved resolutions require higher allocation in the data rate. Thus, forward adaptive puts practical limitation on performance at very low bit rates. Figure 2-11 – Forward Adaptive Bit Allocation Encoder Block Diagram Backward Adaptive Bit Allocation This is a method where the encoder calculates bit allocation from the coded audio data without explicit information from the encoder. It does not require utilization of a portion of the available bit-rate, thus all of the bits can be allocated to audio coding. It is more efficient in transmission and allows the bit allocation to have superior time and frequency resolution. The block diagram is shown in Figure 2-12. However, the bit allocation must be calculated in the decoder from the information contained in the bit stream, thus of limited accuracy and may contain small errors. Also, psychoacoustic model may not be updated since the bit allocation algorithm in the encoder is fixed. Bit Allocator Filter Bank Quantize Multiplex Input signal Bit allocation Coded bit stream
  • 16. Figure 2-12 - Backward Adaptive Bit Allocation Encoder Block Diagram Hybrid Forward/Backward Bit allocation The Hybrid Forward/Backward Bit allocation is the method implemented by the AC-3 encoding algorithm. It uses the core backward adaptive bit allocation process found in both the encoder and the decoder. This core routine is accurate and based on the psychoacoustic model which has assumptions on the masking properties of signals. The block diagram is shown in Figure 2-13. The input to the core routine is a spectral envelope which is part of the encoded audio. In AC-3, the spectral envelope has time/frequency resolution equal to the analysis filter bank. Figure 2-13 – Hybrid Forward/Backward Bit Allocation Encoder Block Diagram 2.3.4. Mantissa Quantization The results of the parametric bit allocation model are used along with the raw mantissa to perform the mantissa quantization. The mantissas are quantized with a variable number of bits based on the results of the bit allocation block. The resulting values in this quantization block are not just composed of a specified number of significant bits. The mantissa values are scaled and offset to provide zero-centered, equal-width and symmetrical quantization levels. [5] Encode Spectral Envelope Filter Bank Quantize Core Bit Allocator Input signal Bit allocator Multiplex Bit Allocation Side Information Encode Spectral Envelope Filter Bank Quantize Multiplex Input signal Bit allocator Encoded Spectral Envelope Coded Bit-stream Quantized Samples Coded Bit-stream
  • 17. 2.3.5. Frame Formatting and Bit Stream Packing The frame formatting and bit stream packing is the final stage in the AC-3 encoding algorithm. All information in each of the six blocks is placed in a series of array and scalar values. This information includes the bit allocation information, exponents, mantissas, coupling coefficients and dither flags. [5] This data packing stage places all information in a single block along with all the necessary data such as the synchronization, header, and wrapper information. The data is coded logically for the decoder to easy unpack the bit stream data. 2.4 AC-3 Decoding Algorithm The AC-3 decoding algorithm is composed of five major blocks as shown in Figure 2-14. The major blocks are basically the same as those in the encoding process implemented inversely. These are the Frame De-Formatting and Bit Stream Unpacking, Spectral Envelope Decoding, Decode Bit Allocation, Mantissa De-quantization, and the Synthesis Filter Bank. The block diagram of the decoding algorithm is shown below: Figure 2-14 - AC-3 Decoder Block Diagram 2.4.1. Frame De-Formatting and Bit Stream Unpacking The AC-3 Frame De-Formatting block shown consists of frame synchronization, error detection and frame de-formatting stages. The frame synchronization ensures that the decoder maintains the sync with the incoming input data stream. [7] The encoded digital audio input enters an input buffer before proceeding to the error detection stage. Each of the decoder input data block is checked in the error detection stage for consistency. In the case that an error is found and it cannot be resolved, the last known correct block can be used as a substitute for the erroneous block. The substitution can be Frame De-Formatting and Bit Stream Unpacking Mantissa De-Quantization Spectral Envelope Decoding Encoded Spectral Envelope Decode Bit Allocation Encoded AC-3 bit stream Synthesis Filter Bank Mantissa Quantized Mantissa Exponents PCM Time Samples
  • 18. done several times until a consistent block is found, unless there are extended error conditions which forces the decoder to mute or revert to analog. [5] After the synchronization and error checking, the data can now be unpacked. This process is simply a reverse of the frame formatting stage. All audio data information are taken out to recover the bit information. 2.4.2. Spectral Envelope Decoding The purpose of the spectral envelope decoding block is to recover the exponents from the original encoded data input after it has undergone the frame de-formatting stage. This will allow the data to be analyzed together with the de-quantized mantissa in the synthesis filter bank to get the original PCM samples. 2.4.3. Decode Bit Allocation The result of the decode bit allocation block together with the quantized mantissa are analyzed to get the raw mantissa information. This block uses transmitted intermediate result for less decoding time. It also allows modification of the derived bit allocation for computation of each channel one at a time. 2.4.4. Mantissa De-Quantization After the decoded bit allocation process, the mantissa is recovered from the quantized state. This is done by using the specified information on the size of each mantissa which can be taken from the decoded bit allocation output. 2.4.5. Synthesis Filter Bank using IMDCT This is the final stage in the AC-3 decoding process. Before the synthesis filter bank takes place, data is first converted to fixed point. This means that the mantissa and exponent are combined to reconstruct the original frequency coefficients. Each of the channel’s recovered transform coefficients are then inversed transformed back to time domain using the Inverse Modified Discrete Cosine Transform (IMDCT), windowed and overlap added to produce the original PCM time samples. The Synthesis Filter Bank is represented by the following equations: 1 0 '( , ) ( , )cos[ (2 1)(2 1) (2 1) ] 4 4 M k X m n Y m k k n k M π π− = = + + + +∑ n = 0,1,… 2M-1 (3) '( ) ( ) '( 1, ) ( ) '( , )x n h n M X m n M h n X m n= + − + + n = 0,1,… M-1 (4) where: x’(n) = reconstruction signal from the synthesis filter bank h (n) = window function
  • 19. 2.5 Related Works This project intends to implement the AC-3 audio coding standard by reviewing related studies on filters and fast algorithms for efficient implementation. Comparison of filter bank designs will be the main focus of this project. 2.6.3. The ATSC Standard: Digital Audio Compression (AC-3), Revision A For technical guidelines on AC-3 encoding and specifications, the ATSC Standard: Digital Audio Compression Revision A will be used. The AC-3 bit stream is specified mainly by the syntax and decoder processing, thus the encoder is not precisely defined. The only normative requirement on the encoding algorithm is that it is able to produce a bit stream that follows the AC-3 syntax. ATSC documentation will be checked for consistency of the encoder implementation. 2.6.4. FFmpeg Multimedia System For the source code reference, the AC-3 encoder included in the codec library of the FFmpeg project will be used. FFmpeg Multimedia System is a complete solution to record, convert and stream audio and video. It is a trademark of Fabrice Bellard, originator of the FFmpeg project and is an experimental and developer-driven project. However, there are no formal releases yet. The project is made of several components: • ffmpeg – command line that converts video format to another • ffserver – http multimedia streaming server for live broadcasts • ffplay – simple media player • libavformat – library containing parsers and generators for all common audio/video formats • libavcodec – the leading audio/video codec library that contains all encoders/decoders. Most of codecs were developed from scratch to ensure quality performances and high code reusability. It is developed under Linux but it can be compiled under most operating system including Windows.
  • 20. Chapter 3 Methodology The overview of the project is shown in Figure 3-1. For the first part, a reference code was used which was modified to create a standard encoding algorithm. Different designs were then implemented to come up with modified encoding algorithms. Finally, quality performance of these encoder designs were then compared and analyzed using different objective and perceptual tests to arrive at an optimum code which will be selected among all of the implementations. 3.1 Design The AC-3 encoder consists of the following information: • AC3_encode_init – sets the correct parameters for the AC-3 encoder • AC3_encode_close – outputs the correct AC-3 bit stream • AC3_encode_frame – contains the actual AC-3 encoding processing The main encoding process takes place in the AC3_encode_frame. It contains several sub functions that perform the different encoder processing requirements which can be summarized in Figure 3-2. These include: • mdct512 – calculates the MDCT of the 512 input time samples • compute_exp_strategy– along with its sub blocks, this computes the exponent strategy that will be used by the encoder • compute_bit_allocation – along with its sub blocks, calculate the number of bits needed by the encoder using mantissa information • output_frame_header, output_frame_end, output_audio_block – frame packing and bit stream formatting
  • 21. Figure 3-1 AC-3 Encode Frame Since the project deals with comparing different filter bank designs, it is necessary that the other parts of the encoder will not be modified so as not to affect its performance. Therefore, the different designs directly refer to the filter bank implementations. As mentioned previously, the AC-3 filter bank uses an oddly-stacked TDAC. The general equation that characterizes TDAC is shown below: ( ) ( )( ) ( )( ) 0 0 cos ( ) 1 cos 2 2 sin ( ) 1 sin 2 2 ( ) ( ) cos( ) sin( ) 2 2 k k n k n k m mK X m x n h P n w n n m mK x n h P n w n n where K blocksize x n time domain sequence X m frequency data of band k in time block m m m and swit π π π π ∞ =−∞ ∞ =−∞     = − + − +            + − + − +        = = = = ∑ ∑ ( )( ) ( )( )0 0 0 cos sin ( ) & ( ) k k ches deciding which path according to block number w n n and w n n analysis and synthesis filters n phase delay h n f n windows with size P + + = = = By setting wk = (2 ( 1/ 2) )k Kπ + , replacing –r=mK/2-n and performing deductions, we can rewrite the equation as a general form of MDCT. exponent_min output_frame_header encode_exp compute_exp_strategy compute_bit_allocation mdct512 output_frame_end output_audio_block calc_exp_diff bit_alloc_masking bit_alloc compute_mantissa_size AC3_encode_frame
  • 22. The analysis filter bank is the processing of 512-time sample from time to frequency domain where the current 256 input time samples are concatenated with the 256 time-sample of the previous set. The filter bank is completed in two major processes: windowing and MDCT implementation. The PCM samples are windowed through a normalization procedure by using a Kaiser-Bessel window with 50% overlap between adjacent blocks. The Kaiser-Bessel window can be described as: 0 0 [ , ] [ ] , 0 2 [ , ] n p KBD Left n p w p N W n n w p α α = − = = ≤ < ∑ ∑ [ ] 2 / 4 1.0 / 4 N where [ , ] , 0 n 2 o o n N I N w p I πα α πα  −  −      = ≤ ≤ 2 0 and ( / 2) [ ] ! k o k x I x k ∞ =   =     ∑ For AC-3, α =4. The window characteristic described above is shown graphically in Figure 3-2. The figure represents the transform window sequence whose values are specified by the ATSC standard shown in Table 3-1. Figure 3-2 AC-3 Window Sequence Due to the window specification, the design modification now depends on the type of MDCT implementation. This can be classified as (1) Direct (2) Indirect (3) Recursive. Direct method, or traditional method, is the straightforward execution of the MDCT block. The indirect method refers to the use of fast algorithms such as Fast Fourier Transforms (FFT). The third method simply refers to algorithms that are recursive in nature.
  • 23. Table 3-1 AC-3 Transform Window Sequence (w[10A+B]) 3.1.1 Direct Method Given the time domain signal samples in a signal block ( )x n where n=0 to N-1 windowed by an N sample real Kaiser-Bessel derived window w(n) to obtain x(n) , the complete MDCT of x(n) is: 1 0 2 2 ( ) ( )cos[ (2 1)(2 1) (2 1)(1 )] k=0,1,2,...(N/2)-1 4 4 1 for the first short transform 0 for the long transform 1 for the second short transform N D n X k x n n k k N N π π α α − = = − + + + + + −  =  + ∑ Removing the scale factor and setting α =0 simplifies the MDCT to: 1 0 (2 1 /2)(2 1) ( ) ( )cos k=0,1,2,...(N/2)-1 2 N n n N k X k x n N π− =       + + + = ∑
  • 24. 3.1.2 Indirect Method using DCT-IV The direct or traditional method can be accomplished by K/2 points DCT-IV by using a series of trigonometric identities: If cos( + )=cos cos sin sin , cos (2 1) 2(2 1 ) 1 cos (2 1)(2 1) 4N 4 sin (2 1) 2(2 1 ) 1 sin (2 1)(2 1) 4N 4 X(m,n) input mth block signal consisting of 256 k N n k n N k N n k n N α β α β α β π π π π                             − + − − + = − + + + − − + = + + = D new input points and 256 old input points Y(m,k) = output X ( ) h(n) = window function x(n) = X(m,n)h(n) The MDCT of x(n) can be rewritten as: ( ) ( ) ( , ) cos 2(2 1)(2 1) cos (2 1) 4 4D k X k h n X m n n k k N π π      = + + + 1 0 1 0 sin 2(2 1)(2 1) sin (2 1) k=0,1,2,...(N/2)-1 4 4 ( , ) cos (2 1) ( ) ( , )cos (2 1)(2 1) 4 2 -sin (2 1) 4 N n N n n k k N Y m k k h n X m n n k N k π π π π π − = − =                                          − + + + = + + + + ∑ ∑ 1 0 / 2 1 0 ( ) ( , ) sin (2 1)(2 1) k=0,1,2,...(N/2)-1 2 ( , ) cos (2 1) [ ( ) ( , ) ( 4 N n N n h n X m n n k N Y m k k h n X m n h N π π − = − =                 + + = + − ∑ ∑ / 2 1 0 1 ) ( , 1 )]cos (2 1)(2 1) 2 -sin (2 1) [ ( ) ( , ) ( 1 ) ( , 1 )] sin (2 1)(2 1) k=0,1,2,...(N/2) 4 2 N n n X m N n n k N k h n X m n h N n X m N n n k N π π π− =                         − − − − + + + + − − − − + +∑ -1 The final analysis filter bank equation can be recognized as a DCT-IV whose implementation structure is illustrated above. This implementation structure is illustrated in Figure 3-3. Figure 3-3 256-point DCT-IV Implementation Structure 0 256 0 1 DCT 1 : : 254 254 255 255 z-1 z-1 z-1 z-1 z-1 h(0) z-1 z-1 z-1 h(1) h(254) h(255) -1 -1 -1 -1 Co Y(m,0) C1 Y(m,1) C254 Y(m,254) C255 Y(m,255) x(n)
  • 25. 3.1.3 Indirect Method using FFT Rewriting the direct implementation of the MDCT gives the form shown by the equation below: 0 2 1 2 21( ) 2 2 0 [ ] Re [ [ ] [ ] ] n knNj n k j j N N N n X k e x n w n e e π π π−− + − − =    =      ∑ This can be implemented using a series of steps: pre-twiddle, N-point FFT, post-twiddle. The pre-twiddle step is the complex multiplication of the input samples by 2 2 n j N e π − . A 512- point FFT is then performed on the pre-twiddled data. The post-twiddle step then takes the real part of the transformed data and multipliying it by a factor 0 2 1 ( ) 2 j n k N e π − + . ( ) ( )( ) ( ) ( )( ) 2 ( 1/ 8) / 2 ( 1/ 8) / ( ' ) ' ( 3 / 4) 0 k n/4-1 '( ) ( / 4) n/4 k n-1 ' 2 ' 2 1 ' / 2 2 ' / 2 2 1 , 0 k n/4-1 ' (2 ) Re( '( )) (2 1) Im( ' m k m m k i k n k k i m n n FFT f F F e f k n f k f k n f f k f n k i f n k f n k f f e F F m F m F m F π π + + = = − + ≤ ≤ = − ≤ ≤ = − − − + + − − + ≤ ≤ = = + =    / 4 1 0 to n/4-1 )m m − − =    3.1.4 Recursive Method The direct method can also be implemented recursively. One traditional recursive method is as follows: Let ' ( / 4) and '' ( 3 / 4)n n N n n N= + = − [ '] [ ' / 4] for n'=N/4,...N-1 [ ''] [ '' 3 / 4] for n''=0,...N/4-1 y n x n N y n x n N = − = + The direct implementation can then be written as, / 2 1 0 1 [ ] ( [ ] [ 1 ]) (2 1) 2cos 2 (2 1)( 1) [ ] [ ]cos N n u n y n y N n n N n k U k u n N π π− = = − − − + + + = ∑ [ ] [ 1] [ ]X k U k U k= + +
  • 26. This previous equations also represents the DCT-IV. By applying trigonometric identities, we get / 4 / 4 1 / 4 2 / 4 / 4 1 / 4 2 [ ] [ ] ( 1) [ / 2 1 ] [ ] cos ( [ ] [ 1]) 2cos [ ] [ ] 2 [ ] sin ( [ ] [ 1]) 2cos [ ] [ ] 2 k k N k k k N N k N k k k N N p n u n u N n D k p j p j D k D k S k p j p j S k S k θ θ θ θ − − − − = + − − − = − − + − = − − + − Finally, the output X[n] can be derived by using these equations: / 2 / 4 1 / 4 1 ( 1) / 2 / 4 1 / 4 1 [ ] ( 1) ( [ ] [ 1])for k is even [ ] ( 1) ( [ 1] [ ])for k is odd k n n k n n X k D k S k X k D k S k − − − − − = − + + = − + + Clenshaw’s Recurrence Formula The proposed algorithms based on a traditional recursive implementation can be formally obtained using Clenshaw’s recurrence formula which is an elegant and efficient way of evaluating the sum of the coefficients.. The formula states that the sum of a given function f(x) can be obtained using this equation: 1 1 2 2 1 3( ) ( ) ( 1, ) ( ) ( )N n N N N Nf x c F x N x F x y F x yβ− − − − − −= − − − Applying it for the MDCT implementation, let 1 2 k k M π θ   = +    and define 2 1 1 2 0 ( ) 2cosn k N N a a a x n a aθ − − − − = = = + − ( ) ( ) ( )1 2 2 1 3( ) ( 1) N k N k N N k NX k x N F F a F aθ θ θ− − − − −= − + − Appplying mathematical transformation to the direct method finally gives the output X(k) as: 1 2( ) cos ( 1) cos ( 1) 2 2 k k N NX k M a M a θ θ − −     = − − + +        The implementation structures of the traditional recursive transform and the recursive solution using Clenshaw’s recurrence formula are shown in Figures 3-4(a) and 3-4(b) respectively.
  • 27. (a) (b) Figure 3-4 (a) Implementation structure for (a) traditional recursive method (b) recursive based on Clenshaw’s recurrence formula 3.2 Implementation FFmpeg Multimedia System was the main reference used for implementing the designs using the C programming language. The code modification was done in an IDE environment using Microsoft Visual Studio 2005. However, the reference can only be compiled in Windows using selected software that can complete the actual compiling and producing of the win32 binaries. Thus, the compilation was done in a command line Linux environment using the MingGW and Minimal SYStem (MSys) system with an integrated gcc compiler. MSys is a branch of Cygwin used in Windows OS for creating configuration and packages. It also offers user environment for MinGW development and allows users of MinGW version of GCC to port and build packages in a GNU familiar way without the UNIX complexities brought about by Cygwin. The implementation procedure involves a direct application of the different methods for the MDCT processing block.
  • 28. 3.3 Testing There were initial tests done to evaluate the performance of the designs. A preliminary listening test was already implemented to determine the test samples to be used for the objective test. A formal listening test will be done during the last phase of the project. Figure 3-5 Flowchart of the Testing Procedure 3.3.1 Preliminary Test The main objective of the preliminary test was to determine the type of audio samples that are best suited for the objective tests. Different types of audio vary in frequency contents. Thus to produce rational results for the objective analysis, worst case scenario must be considered. This scenario implies that the encoded audio sample should be far from the original PCM audio. If the difference between the quality of the encoded audio sample and the original audio is almost unnoticeable, we can immediately expect that the results will be almost identical. The International Telecommunication Union [ITU-R BS.1284] provides the appropriate guidelines and scale to be used for quality assessment. In determining the relative difference of two samples with quality far from each other, the seven-grade comparison scale shown in the figure below is recommended. Grade Comparison 3 Much Better 2 Better 1 Slightly Better 0 The Same -1 Slightly Worse -2 Worse -3 Much Worse The system used was the Multiple Stimulus with Hidden Reference and Anchors (MUSHRA) method. It is a double blind multiple stimulus test method with hidden Preliminary Test Objective Tests Formal Listening Test Traditional and Modified Designs Listening and Objective Test Results
  • 29. reference. This setup involves a reference audio that can be compared against multiple test samples, wherein one of the test samples is the actual same copy of the reference sample unknown to the listener. Ideally, the user must be able to give a grade of zero since one of the test samples is its duplicate. For this project, the reference audio used was the original PCM audio. One of the test samples was a copy of the original and the rest of the test samples were the different designs used for implementing the AC-3 encoding algorithm. The preliminary listening test was then performed using three different PCM audio (plain speech, a pop music, and an instrumental song) using randomly selected AC- 3 encoder designs (direct method, MDCT using DCT-IV and MDCT using FFT). In addition to this, preliminary objective test was done to assess the frequency response of the selected audio. 3.3.2 Objective Tests After getting the results of the preliminary listening test, the audio sample was selected and the objective tests were performed using MATLAB 7.0. However, since MATLAB is unable to directly read AC-3 formats, the encoded audio sounds were first recorded to raw PCM format that MATLAB recognizes. The software used was Audacity, an open source program capable of recording sounds without the need for microphone and speaker set-up. This ensures that there is no degradation in audio quality and the characteristics of the encoded audio format can truly be analyzed. Moreover, stereo separation into left and right channels was done in order to study the correlation and frequency response of the encoded AC-3 audio. The autocorrelation of the original PCM audio was plotted against the correlation between the original audio and each design. This will illustrate whether or not the encoded AC-3 audio and the original PCM are highly correlated. The frequency response of each design was then determined to verify if there are noticeable differences compared to that of the original PCM audio. The flowchart below summarizes the procedure done for the objective tests.
  • 30. Figure 3-6 Flowchart of the Objective Test Procedure 3.3.3 Formal Listening Test Formal listening test is considered the ultimate measurement of audio quality. The test was done in a similar manner to that in the preliminary listening tests. However, the scale used for the formal listening test is the five-grade impairment scale. This is defined by the [ITU-R BS.562-3] as the scale used for systems with small impairments. This is related to the five-grade quality scale as shown below. The test method used is still a double blind multiple stimulus test method with hidden reference. Grade ITU-R Grade 5 Imperceptible 4 Perceptible, but annoying 3 Slightly annoying 2 Annoying 1 Very annoying 3.3.4 Playback The open source encoder AC3Filter developed by Alexander Vigovsky on August 2006 [9] will serve as the AC-3 main decoder. AC3Filter is a high quality free DirectShow AC-3 filter that allows decoding of videos with AC3-encoded surround audio. The source is composed of the filter and an audio processing library. The filter implements all interfaces and logic required by DirectShow while the audio processing Objective Analysis - frequency response and correlation Objective Test Results Channel Separation - to left and right channels in MATLAB AC-3 recording - to PDCM audio using Audacity Traditional and Modified Designs
  • 31. library implements all audio decoding and processing algorithms. It supports configuration settings that allow control of audio sound intensity and can decode AC- 3/DTS/MPEG1/2 Audio Layer I/II formats. Some of the features include decomposition to six channels, full information about the audio track format, per channel amplification for all input/output channels and per channel delays. 3.3.5 Listening Conditions The listening tests were performed with the listeners having direct control on the listening conditions. The direct control includes being allowed to repeat the audio samples and change the audio volume level for. They were asked to use headphones throughout the duration of testing.
  • 32. Chapter 4 Results and Analysis 4.1 Objective Measurements The preliminary objective test done showed that the different encoder designs produced an encoded audio with frequency characteristics similar to that of the original PCM input audio. The test was done on three different audio inputs namely pop music, pure speech, and an instrumental song. Initially, three encoder designs were considered. These are the direct implementation, and two modified encoder designs. The first modified design is the filter bank implementation based on DCT-IV and the other one is based on FFT. The frequency responses of the designs show minimal differences. The low frequency components of the encoded audio show unnoticeable changes from the original input PCM but some high frequency artifacts were present after the encoding process. The encoder design with an MDCT implementation using FFT showed more visible artifacts than the other designs. The frequency response of the original instrumental audio PCM input (electric piano) is plotted in Figure 4-1 followed by those of the modified designs in Figure 4-2. (a) (b) Figure 4-1 Frequency Response of the Original Input for (a) Left (b) Right Channels
  • 33. (c) (d) (e) (f) (g) (h) Figure 4-2 Frequency Response of the Different Designs using: (c) - (d) Direct Method , (e) –(f) Indirect Method DCT-IV , (g) - (h) Indirect Method FFT for both left and right channels The crosscorrelation of the original input PCM and the encoded audio for each design was also determined and is illustrated in Figure 4-3.
  • 34. (a) (b) Figure 4-3 Autocorrelation of the Original Input for (a) Left (b) Right Channels (c) (d) (e) (f)
  • 35. (g) (h) Figure 4-4 Correlation of the Different Designs and Original Input PCM using: (c) - (d) Direct Method , (e) –(f) Indirect Method DCT-IV , (g) - (h) Indirect Method FFT for both left and right channels The results above show that there is a high correlation between the encoded audio and the original input PCM. Though frequency response and correlation present a general insight on the frequency components present in an audio sample and its correlation with the original input, it shows little or no information on the quality of the encoded audio aside from the noticeable appearance of frequency artifacts. 4.2 Audio Test Samples The preliminary listening test done reveal that the listener was not be able to recognize the impairments among the different designs immediately for ordinary pop music, but it became more obvious when the input audio sample used contains high frequency components, and thus, high pitch. This is in line with the result of the frequency response mentioned earlier when frequency artifacts appear after encoding. It is also more noticeable when the audio sample is made up of a single instrument or speech. The ensemble of musical instruments masks the impairments brought about by the encoder design. Table 4-1 shows the average grade given by 10 listeners. Test Sample Encoder Design Arithmetic Mean Pop Music Original -0.3333 Direct 0 Indirect using DCT 0.6667 Indirect using FFT 1.125 Speech Original 0.3333 Direct -1 Indirect using DCT -1.1111 Indirect using FFT -1.5556 Table 4-1 Result for the preliminary listening tests
  • 36. Having realized this, the selections of critical material used are listed in Table 4-2. The test samples are composed primarily of different musical instruments expect for speech and pop music. Input Test Samples 1. CLARINET 2. DRUMSET 3. GUITAR_CONCERT 4. GUITAR_ELECTRIC 5. PIANO_CONCERT 6. PIANO_ELECTRIC 7. POP MUSIC 8. SAXOPHONE 9. SPEECH Table 4-2 List of critical materials to be used
  • 37. Chapter 5 Project Schedule 5.1 Revised Gantt Chart Week no. 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 A B C D E F G H Tasks A Implementation and Testing of the analysis filter bank B Implementation and testing other encoder blocks C Implementation and Testing of the 1st and 2nd modified filter bank designs D Implementation and Testing of the 3rd and 4th modified filter bank designs E Listening tests and comparative analysis of the designs F Testing of other optional AC-3 compression features G Final Comparative analysis and Listening Tests H Documentation 5.2 Halfway-Point Deliverables The standard integrated code, test results and documentation for each of the following individual blocks in the encoder: • Implementation of four different filter bank designs • Objective test results • Preliminary listening test results 5.3 Final Deliverables Task
  • 38. The final deliverables include four filter bank designs and analysis of AC-3 audio compression feature. The comparative analysis of four different filter bank implementations before the conclusion of the optimum final code will also be presented, stating grounds for selecting the design. All functional and listening test results will also be presented along with the full documentation of the project implementation.
  • 39. Bibliography [1] C. Liu and W. Chang. “Audio Coding Standards”. National Chiao Tung University, Taiwan. [2] Digital Audio Formats. http://www.teamcombooks.com/mp3handbook/12.htm Feb 10, 2007 [3] Dolby AC-3 Audio Coding. http://www.fedele.com/website/hdtv/dolbyac3.htm Feb 10, 2007 [4] Dolby AC-3. http://www.mp3-tech.org.htm. Feb 10, 2007 [5] M. Davis. “The AC-3 Multichannel Coder”. Dolby Laboratories Inc., San Francisco, 1993. [6] C. Todd, G. Davidson, M. Davis, L. Fielder, B. Link, and S. Vernon. “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage”. Dolby Laboratories Inc., San Francisco, 1994. [7] A. Alvarez, S. George, H. Yang, and K. Das. “SGS-Thomson DSP D950-Core Audio Compression”. DSP R&D Centre SGS-Thomson Microelectronics”. [8] Y. Chen, C. Tsai, and J. Wu. “Fast Time-Frequency Transform Algorithms and their Applications to Real-Time Software Implementation of AC-3 Audio Codec”. National Taiwan University. [9] AC3Filter Project. http://ac3filter.net/ Feb 10, 2007 [10] AC3Decode. http://www.users.on.net/%7Ersobon/ac3dec.html Feb 10, 2007 [11] Z. Zhao. “Efficient Implementation Structures of AC-3’s Filter Banks.” Hangzhou Institute of Electronic Engineering, Hangzhou China. [12] S. Lee and C. Liu. “Transformation from 512-point transform coefficients to 256- point transform coefficients for Dolby AC-3 decoder”