A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Paper ID 8521
A Low Latency Implementation of a Non
Uniform Partitioned Overlap and Save
Algorithm for Real Time Applications
A. Primavera1
, S. Cecchi1
, P. Peretti1
L. Romoli1
, and F. Piazza1
1
A3Lab - DIBET - Università Politecnica delle Marche
Via Brecce Bianche 1, 60131 Ancona Italy
www.a3lab.dibet.univpm.it
Abstract
FIR convolution is a widely used operation in digital signal processing field, espe-
cially for filtering operations in real time scenarios. In this context, low computation-
ally demanding techniques for calculating convolutions with low input/output latency
become essential, considering that the real time requirements are strictly related
to the impulse response length. In this paper, a multithreaded real time implemen-
tation of a Non Uniform Partitioned Overlap and Save algorithm is proposed with
the aim of lowering the workload required in applications like reverberation, also ex-
ploiting the human ear sensitivity. Several results are reported in order to show the
effectiveness of the proposed approach in terms of computational cost, taking into
consideration different impulse responses and also introducing comparisons with
existing techniques of the state of the art.

Introduction
FIR Filtering is probably one of the most recurrent operations in DSP. It is an expensive task espe-
cially for long impulse responses (IRs) and low I/O latency.
LOW LATENCY CONVOLUTION
COMPUTATIONAL COST
MINIMIZATION
Problem
In the last 30 years, fast convolution algorithms have been deeply investigated:
• OverLap and Save (OLS), OverLap and Add (OLA) [1];
• Partitioned OverLap and Save (POLS) [2, 3, 4];
• Non Uniform Partitioned OverLap and Save (NUPOLS) [5, 6].
State of the Art
We propose a real-time implementation of a NUPOLS algorithm based on:
• Automatic partitioning;
• Multithreading implementation;
• Psychoacoustic improvement;
Proposed Solution

Convolution (1)
Assuming a linear time-invariant system, the linear convolution between the input signal x and the
system impulse response h is deﬁned as follows:
y(t) = x(t) ∗ h(t) =
∞
−∞
x(t − τ)h(τ)dτ. (1)
For discrete-time signals and impulse response with a ﬁnite length N, it results:
y[n] = x[n] ∗ h[n] =
N−1
m=0
x(n)h(m − n) (2)
The convolution is performed using equation (2).
LATENCY: Theoretically zero;
COMPUTATIONAL COST: N − 1 additions and N multiplications;
CONSIDERATIONS: It results too expensive for long IR (high values of N).
Time Domain Convolution
Taking into consideration the circular convolution and the DFT property:
y[n] = x[n] N h[n] =
N−1
m=0
x[(n − m)N]h[m] x[n] N h[n] ↔ X[k]H[k], (3)
it results that the convolution can be computed in the frequency domain.
Frequency Domain Convolution

Convolution (2)
The OLS algorithm allows to convert a circular convolution into a linear convolution.
LATENCY: Equal to K samples with K > N;
COMPUTATIONAL COST: 2LlogL
K + L
K complex multiplications (with K power of 2 and L = 2K for
50% overlap);
CONSIDERATIONS: I/O latency is too high for long IR (high values of N).
OverLap and Save (OLS)
The IR is partitioned in sections of equal size, then, an OLS is applied on each sub-filter.
LATENCY: Equal to K samples with K arbitrarily chosen;
COMPUTATIONAL COST: 2LlogL
K +LP
K complex multiplications and
L(P−1)
K additions (with K power
of 2, P the number of partitions and L = 2K for 50% overlap);
CONSIDERATIONS: The required computational cost is higher than in the OLS.
Uniform Partitioned OverLap and Save (POLS)
The IR is partitioned in sections of increasing size, in order to reduce the computational cost
allowing a real-time implementation of zero latency convolution.
LATENCY: Theoretically zero;
COMPUTATIONAL COST: It depends on the adopted partitioning;
CONSIDERATIONS: It is difficult to find the optimal partitioning.
Non Uniform Partitioned OverLap and Save (NUPOLS)

Proposed Algorithm (1)
A real time implementation of a suitable NUPOLS algorithm is proposed using NU-Tech framework.
Fig.1 Block diagram of the Non Uniform Partitioned Overlap and Save algorithm.
Three are the main features of the proposed approach:
The required workload is a function of the number of POLSs employed in the NUPOLS algorithm
[6]. The optimal partitioning depends on the IR length and the I/O latency constraint.
An automatic partitioning procedure is proposed exploiting an ofﬂine pre-analysis based on an
iterative evaluation of the obtained performance, considering that:
• Four partitions are typically enough to obtain good performance;
• Very large FFTs are usually not recommended.
Automatic Partitioning (1)

Proposed Algorithm (2)
Each POLS can be considered as a single thread.
• Run different convolutions simultaneously with an automatic parallelization of the operations;
• High scalability of the implementation.
Multithreaded Implementation (2)
It is possible to reduce the computational cost exploiting the human ear sensitivity [7].
The number of complex multiplications to be performed can be lowered by taking into consideration
only the spectral components with signiﬁcant energy content.
Fig.2 Reverberation Time Fig.3 Frequency Bin considered in each partition of the NUPOLS.
Psychoacoustic Improvement (3)

Results (1)
Several tests have been carried out to evaluate the effectiveness of the proposed approach through
objective and subjective comparisons.
Objective Analysis
• Two different tests have been performed:
1. Workload estimation of POLS and NUPOLS algorithms as a function of the IR length.
2. Analysis of the CPU load for three real IRs (small, medium and large size) in order to show the improvement
introduced by the psychoacoustic approach.
• Three different values for the framesize (i.e., 64, 256 and 1024 samples) have been used.
• All the tests have been done using a PC with Intel Core 2 @ 2.5 GHz and 2 GigaByte of RAM.
Fig.4 Analysis of the workload as a function of the IR length.
• POLS performance is strictly re-
lated to the I/O constraint.
• NUPOLS allows to obtain better
performance than POLS.
Considerations

Results (2)
Workload of a Partitioned Overlap and Save for different IRs and framesizes (with/without psycho-acoustic approach).
POLS
No PsychoAcoustic PsychoAcoustic
FS IR T60 SpeedUp T50 SpeedUp T40 SpeedUp
Small 96,9 65,2 1,49 62,1 1,56 55,7 1,74
64 Medium 331,7 233,1 1,42 197,5 1,68 175,6 1,89
Large 709,8 558,3 1,27 487,8 1,46 467,67 1,52
Small 24,8 16,4 1,51 15,3 1,62 14,4 1,72
256 Medium 76,8 47,6 1,61 43,7 1,76 41,8 1,84
Large 159,9 118,3 1,35 105,0 1,52 98,1 1,63
Small 7,2 5,3 1,36 4,7 1,52 4,4 1,61
1024 Medium 21,1 12,8 1,64 11,9 1,77 11,9 1,77
Large 41,8 29,8 1,40 26,7 1,56 24,8 1,68
Workload of a Non Uniform Partitioned Overlap and Save for different IRs and framesizes (with/without psycho-acoustics approach).
NUPOLS
No Psycho-acoustics Psycho-acoustics
FS IR T60 SpeedUp T50 SpeedUp T40 SpeedUp
Small 7,9 6,3 1,26 6,0 1,31 5,8 1,37
64 Medium 13,5 9,4 1,43 9,1 1,48 9,0 1,49
Large 14,9 13,7 1,08 13,4 1,11 13,2 1,13
Small 5,0 4,0 1,25 3,9 1,28 3,8 1,31
256 Medium 8,2 6,0 1,37 5,7 1,44 5,7 1,45
Large 9,4 9,3 1,01 8,6 1,10 8,1 1,17
Small 3,3 2,6 1,27 2,6 1,26 2,5 1,31
1024 Medium 4,9 4,3 1,14 4,3 1,16 4,1 1,21
Large 7,7 6,2 1,24 5,8 1,32 5,6 1,37

Results (3)
Subjective Analysis
Following the MUSHRA guidelines [8] [9], the preservation of audio quality as a function of the
perceptive thresholds (T60, T50 and T40) has been evaluated by 15 listeners.
Fig.5 Listening test results for Small IR. Fig.6 Listening test results for Medium IR.
Fig.7 Listening test results for Large IR.
• Using a threshold based on T60 doesn’t af-
fect the perceived audio quality.
• Some artifacts are perceivable employing
T50 and T40.
Considerations

Conclusions
• A complete review of the most common convolution techniques has been presented;
• A multithreaded real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is here pro-
posed;
• The proposed algorithm is based on three key points:
– Automatic partitioning of the IR based on an offline analysis;
– Multithreaded implementation to achieve an automatic parallelization of the operations;
– Psychoacoustic optimization to reduce the computational cost.
• Different tests have been carried out according to objective and subjective measures, proving the effectiveness of
the approach in terms of both computational saving and preservation of audio quality.
• Future works will be oriented to a further investigation on the threshold used in the psychoacoustic approach and a
real time implementation of the presented algorithm on an embedded platform.
References
[1] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, Prentice Hall International Inc., 1999.
[2] Barry D. Kulp, “Digital Equalization Using Fourier Transform Techniques,” in Proc. 85th Audio Engineering Society Convention (AES’88), Los Angeles, USA, Oct. 1988.
[3] A. Farina and A. Torger, “Real Time Partitioned Convolution for Amiophonics Sourround Sound,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, NY, USA, Oct.
2001.
[4] E. Armelloni, C. Giottoli, and A. Farina, “Implementation of real-time partitioned convolution on a DSP board,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New
Paltz, NY, USA, Oct. 2003, pp. 71–74.
[5] W. G. Gardner, “Efficient Convolution without Input-Output Delay,” J. Audio Eng. Soc., vol. 43, no. 3, pp. 127–136, Mar. 1995.
[6] Guillermo Garcia, “Optimal Filter Partition for Efficient Convolution with Short Input/Output Delay,” in Proc. of 113rd Audio Engineering Society Convention (AES’02), Los Angeles, CA, USA, Oct. 2002.
[7] Wen-Chieh Lee, Chung-Han Yang, Chi-Min Liu, and Jiun-In Guo, “Perceptual Convolution for Reverberation,” in Proc. 115th Audio Engineering Society Convention (AES’03), New York, U.S., November
2003.
[8] ITU-R BS. 1534, “Method for subjective listening tests of intermediate audio quality,” 2001.
[9] E. Vincent, “MUSHRAM: A MATLAB interface for MUSHRA listening tests,” 2005.

A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications

Similar to A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications (20)

Recently uploaded

Recently uploaded (20)

A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Algorithm for Real-time Applications