Upcoming SlideShare
×

# Chap5 - ADSP 21K Manual

601 views
525 views

Published on

This is a sample of a manual I developed while at Wideband for a software math and science digital signal processing library for the Analog Devices ADSP-21K. It contains the detailed descriptions of the routines and shows the programmers had a complete and useful solution.

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
601
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
3
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Chap5 - ADSP 21K Manual

1. 1. ADSP-21K Optimized DSP Library User’s ManualCHAPTER 5 Function Descriptions For The ADSP-21K Optimized DSP Library Each function described in the following pages includes the following topics in order to better understand its use: • Name • Description of the functions operation • The algorithm as applicable • Synopsis of function prototype • Domain valid for arguments • Accuracy of the returned value(s) • Execution time in machine cycles • Notes applicable to this function Wideband Computers, Inc. 5-55
2. 2. ADSP-21K Optimized DSP Library User’s Manual acort ( a, c, m, n ) NAME Auto-correlation (Time Domain) DESCRIPTION Computes the time domain auto-correlation of the real elements stored in input vector a[ ]. Values m and n define the number of auto-correlation values to compute. The resulting auto-correlation values are stored in output vector c[ ]. n–i–1 ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 } j= 0 SYNOPSIS void acort ( a, c, m, n ) float *a ; /* Pointer to input vector a[ ] */ float *c ; /* Pointer to output vector c[ ] */ int m ; /* Lag count m */ int n ; /* Number of elements in vector a[ ] */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 31 + 9*M + (M+1) (2*N-M) NOTES The file tacort.c included in the distribution tape provides an example of this func- tion’s use. Note that the lag count m must be less than or equal to the number of floating-point elements (i.e. m ≤ n ).5-56 Wideband Computers, Inc.
3. 3. ADSP-21K Optimized DSP Library User’s Manualacos_wci ( x ) NAME Arc Cosine DESCRIPTION This function computes the arc cosine of a floating-point number, x. The computed value returned from this function is in the range [0 to π ] radians. A domain error is returned if x is not in the range [-1 to +1]. ALGORITHM return = cos –1( x ) SYNOPSIS float acos_wci ( float x ) DOMAIN -1.0 < x < +1.0 ACCURACY 7.75 decimal digits EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 75 cycles NOTES The file tacos.c included in the distribution tape provides an example of this functions use.acosh_wci ( x ) NAME Inverse Hyperbolic Cosine DESCRIPTION This function computes the inverse hyperbolic cosine of a floating-point number, x. ALGORITHM return = cosh – 1( x ) SYNOPSIS float acosh_wci ( float x ) DOMAIN 1.0 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 72 cycles NOTES The file tacosh.c included in the distribution tape provides an example of this func- tions use. Wideband Computers, Inc. 5-57
4. 4. ADSP-21K Optimized DSP Library User’s Manual alawc ( a, i, c, k, n ) NAME a-Law Compression DESCRIPTION This routine performs an a-law compression on the elements in input vector a and out- puts the compressed results to output vector c. C mk = alaw compression of A mi ALGORITHM m = { 0, 1, 2, …n – 1 } SYNOPSIS void alawc ( a, i, c, k, n ) int *a ; /* Pointer to input vector a */ int i ; /* Element stride for vector i */ int *c ; /* Pointer to output vector c */ int k ; /* Element stride for vector c */ int n ; /* Number of floating-point elements */ DOMAIN 0 to 255 ACCURACY 7.75 decimal digits EXECUTION TIME 49 + 12 * ( N-1 ) NOTES The file talawc.c included in the distribution tape provides an example of this func- tion’s use. The alawc() routine takes a linear 13-bit signed speech sample and compresses it according to CCITT (now ITU) recommendation G.711. The 8-bit compressed sample is output to vector c. This function is found on the serial port hardware for the ADSP-2106x DSP proces- sors.5-58 Wideband Computers, Inc.
5. 5. ADSP-21K Optimized DSP Library User’s Manualalawe ( a, i, c, k, n ) NAME a-Law Expansion DESCRIPTION This routine performs an a-law expansion on the elements in input vector a and out- puts the expanded results to output vector c. C mk = alaw expansion of A mi ALGORITHM m = { 0, 1, 2, …n – 1 } SYNOPSIS void alawe ( a, i, c, k, n ) int *a ; /* Pointer to input vector a */ int i ; /* Element stride for vector i */ int *c ; /* Pointer to output vector c */ int k ; /* Element stride for vector c */ int n ; /* Number of floating-point elements */ DOMAIN 0 to 255 ACCURACY 7.75 decimal digits EXECUTION TIME 46 + 17 * ( N-1 ) NOTES The file talawe.c included in the distribution tape provides an example of this func- tion’s use. The alawe() routine takes an 8-bit compressed speech sample and expands it accord- ing to CCITT (now ITU) recommendation G.711. The 13-bit signed sample is output to vector c. This function is found on the serial port hardware for the ADSP-2106x DSP proces- sors. Wideband Computers, Inc. 5-59
6. 6. ADSP-21K Optimized DSP Library User’s Manual alpha ( df, a, &al, &n ) NAME Kaiser-Bessel Window Shape Parameter DESCRIPTION Computes a Kaiser-Bessel window shape parameter for later use by the kaiser( ) win- dow mutiply library function. The computation is based on the input attenutation specified in input scalar a and the transition width specified in real input scalar df. From this, a count of floating-point elements (output scalar n) and an output window shape parameter (output scalar al) is computed. If A ≤ 21 then al = 0 ALGORITHM Else If 0.4 A < 50 then al = 0.5842 • ( A – 21 ) + 0.07886 • ( A – 21 ) Else If al = 0.1102 • ( A – 8.7 ) Number of Elements n is computed as follows: ( A – 7.95 ) If A > 21 then d = ------------------------ else d = 0.922 - 14.36 n = 1 + ceiling ( d ⁄ df ) n = n + 1 – remainder ( n ⁄ 2 ) SYNOPSIS void alpha ( df, a, &al, &n ) float dm *df ; /* Input transition width in fs units */ float dm a ; /* Input ripple attenutation in dB */ float dm &al ; /* Output alpha window shape parameter */ int &n ; /* Output floating-point element count */ -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-60 Wideband Computers, Inc.
7. 7. ADSP-21K Optimized DSP Library User’s Manualalpha ( df, a, &al, &n ) EXECUTION TIME If a >= 50 then 143 Cycles If 21 < a < 50 then 221 Cycles If A <= 21 then 124 Cycles NOTES The file talpha.c included in the distribution tape provides an example of this func- tion’s use. – A ⁄ 20 df = ∆f ⁄ f s, A = ripple attentuation in dB, δ = 10asin_wci ( x ) NAME Arc Sine DESCRIPTION This function computes the arc sine of a floating-point number, x. The computed value returned from this function is in the range [-π/2 to π/2] radians. A domain error is returned if x is not in the range [-1 to +1]. ALGORITHM return = sin – 1( x ) Wideband Computers, Inc. 5-61
8. 8. ADSP-21K Optimized DSP Library User’s Manual asin_wci ( x ) SYNOPSIS float asin_wci ( float x ) DOMAIN - 1.0 < x < +1.0 ACCURACY 7.75 decimal digits EXECUTION TIME If A <= 0.5 then 55 cycles, Else if A >0.5 then 73 cycles NOTES The file tasin.c included in the distribution tape provides an example of this functions use. asinh_wci ( x ) NAME Inverse Hyperbolic Sine DESCRIPTION This function computes the inverse hyperbolic sine of a floating-point number, x. ALGORITHM return = sinh –1( x ) SYNOPSIS float asinh_wci ( float x ) DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 57 cycles NOTES The file tasinh.c included in the distribution tape provides an example of this func- tions use.5-62 Wideband Computers, Inc.
9. 9. ADSP-21K Optimized DSP Library User’s Manualaspec ( a, c, n ) NAME Accumulating Auto-spectrum DESCRIPTION Computes the auto-spectrum of complex input vector a by multiplying vector a by its complex conjugate and adding the resulting real number to the current value of vector c. Vector c must be initialized prior to invoking a series of accumulating auto-spec- trum calls. 2 2 ALGORITHM C m ⇐ C m + Re Am + Im Am m = { 0, 1, 2, …n – 1 } SYNOPSIS void aspec ( a, c, n ) complex *a ; /* Pointer to input vector a */ float *c ; /* Pointer to output vector c */ int n ; /* Element count for vector c */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 28 + 6*N cycles NOTES The file taspec.c included in the distribution tape provides an example of this func- tion’s use. The stride of vectors a and c must always be 1. If you wish to clear the auto-spectrum results before they are added to output vector c use the vclr( ) function. If the results are not cleared using vclr( ), autospectrum results are added to output vector c, thus computing an accumulating autospectrum. Note that input vector a is of type complex, and data arguments supplied to this routine will be treated as interleaved real and imaginary data. Wideband Computers, Inc. 5-63
10. 10. ADSP-21K Optimized DSP Library User’s Manual atan_wci ( x ) NAME Arc Tangent DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed value returned from this function is in the range [-π/2 to +π/2] radians. ALGORITHM return = tan –1( x ) SYNOPSIS float atan_wci ( float x ) DOMAIN - 4.2E+37 < x < +4.2E+37 ACCURACY 7.75 decimal digits EXECUTION TIME 59 cycles NOTES The file tatan.c included in the distribution tape provides an example of this functions use.5-64 Wideband Computers, Inc.
11. 11. ADSP-21K Optimized DSP Library User’s Manualatan2_wci ( y, x ) NAME Arc Tangent 2 Arguments DESCRIPTION This function computes the arc tangent of a floating-point number x. The computed value returned from this function is in the range [-π to +π] radians. –1 y return = tan  --  ALGORITHM - x SYNOPSIS float atan2_wci ( y, x ) float dm y ; /* Input value y */ float dm x ; /* Input value x */ DOMAIN - 4.2E+37 < y/x < +4.2E+37, except x = 0.0 ACCURACY 7.75 decimal digits EXECUTION TIME 76 cycles NOTES The file tatan2.c included in the distribution tape provides an example of this func- tions use. Wideband Computers, Inc. 5-65
12. 12. ADSP-21K Optimized DSP Library User’s Manual atanh_wci ( x ) NAME Inverse Hyperbolic Tangent DESCRIPTION This function computes the inverse hyperbolic tangent of a floating-point number, x. ALGORITHM return = tanh – 1( x ) SYNOPSIS float atanh_wci ( float x ) DOMAIN -1.0 to +1.0 ACCURACY 7.75 decimal digits EXECUTION TIME 59 cycles NOTES The file tatanh.c included in the distribution tape provides an example of this func- tions use.5-66 Wideband Computers, Inc.
13. 13. ADSP-21K Optimized DSP Library User’s Manualbartlett ( a, i, c, k, n ) NAME Bartlett Window DESCRIPTION This function generates a Bartlett window multiply on the elements of input vector a and places the results in output vector c. ALGORITHM  1   m – -- n  2 - C mk = Ami • 1 – ----------------   -- n  1 -  2  m = { 0, 1, 2, …n – 1 } SYNOPSIS void bartlett ( a, i, c, k, n ) float *a ; /* Pointer to input vector a */ int i ; /* Address stride in words for input vector a */ float *c ; /* Pointer to output vector c */ int k ; /* Address stride in words for output vector c */ int n ; /* Element count */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits EXECUTION TIME 44 + 17 * ( N-1 ) cycles NOTES The file tbartlett.c included in the distribution diskette provides an example of this function’s use. The Bartlett window is also known as a triangular window. Wideband Computers, Inc. 5-67
14. 14. ADSP-21K Optimized DSP Library User’s Manual biquad ( x, d, c, y, n ) NAME Bi-Quad IIR Filter DESCRIPTION Using a bi-quad implementation, this function computes an IIR ( Infinite Impulse Response ) filter using coefficients stored in input vector c, delay node points stored in input buffer d, and applied to the elements of input vector x. The results are stored in output vector y. –1 –2 B0 + B1 z + B2 z ALGORITHM H ( z ) = ----------------------------------------------- - –1 –2 1 – A1 z – A2 z where Dm = A2 • Dm – 2 + A1 • Dm – 1 + xm Y m = B2 • Dm – 2 + B1 • Dm – 1 + Dm m = { 0, 1 , 2 , …, n – 1 } SYNOPSIS void biquad ( x, d, c, y, n ) float *x ; /* Pointer to input buffer vector x of length n */ float *d ; /* Pointer to input delay node buff vector d of length 2 */ float *c ; /* Pointer to input coeff buffer vector c of length 5 */ float *y ; /* Pointer to output buffer vector y of length n */ int n ; /* Number of input/output samples to compute */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-68 Wideband Computers, Inc.
15. 15. ADSP-21K Optimized DSP Library User’s Manualbiquad ( x, d, c, y, n ) EXECUTION TIME 65 + 13*N NOTES This is a single bi-quad form of an infinite impulse response filter (IIR), defined by the first equation shown above. It is implemented using a delay node buffer d shown in the second and third equation shown above. The coefficients a[ ] and b[ ] are passed in a single array c[ ] given by the following: c [ 0 ] = A2 c [ 1 ] = B2 c [ 2 ] = A1 c [ 3 ] = B1 c [ 4 ] = B0 Prior to executing the filter loop, the two “oldest” delay node values are loaded from buffer d[ ]. When the filter loop has completed (n samples have been processed) the two “newest” delay node values are written to d[ ]. In this way the filter delay node states are retained between calls, allowing filtering on blocks of contiguous samples. The user is responsible for allocating the delay node array and for initializing its ele- ments to zero prior to the first call to biquad( ). Defining d0 = D m d1 = D m – 1 d2 = D m – 2 Then d0 = c0 • d2 + c2 • d1 + xm ym = c1 • d2 + c 3 • d1 + c 4 • d0 d2 = d1 d1 = d0 m = { 0 , 1 , 2, … , n – 1 } The coefficient buffer length is defined symbolically in the file dsppac.h as DSP_BIQUAD_NCOEFF. The delay node buffer length is defined symbolically in the file dsppac.h as DSP_BIQUAD_NDELAY. The number of input samples n must be greater than or equal to 5. The file tbiquad.c included in the distribution tape provides an example of this func- tion’s use. Wideband Computers, Inc. 5-69
16. 16. ADSP-21K Optimized DSP Library User’s Manual blkman ( a, i, c, k, w, h, n ) NAME Blackman Window Multiply DESCRIPTION Multiplies the input vector a[ ] by a Blackman window and stores the result to vector c[ ]. ALGORITHM 2πmi 4πmi C mk = A mi • 0.42 – 0.50 • cos ------------ + 0.08 • cos ------------ - - N N m = { 0, 1, 2, …, n – 1 } SYNOPSIS void blkman ( a, i, c, k, w, h, n ) float dm *a ; /* Pointer to input vector a */ int i ; /* Element stride for vector a */ float dm *c ; /* Pointer to output vector c */ int k ; /* Element stride for vector c */ float pm *w ; /* Pointer to cosine weights array */ int h ; /* Element stride for weights array */ int n ; /* Element count for vector c */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-70 Wideband Computers, Inc.
17. 17. ADSP-21K Optimized DSP Library User’s Manualblkman ( a, i, c, k, w, h, n ) EXECUTION TIME 41 + 4*(N-1) cycles NOTES The file tblkman.c included in the distribution tape provides an example of this func- tion’s use. For real-time applications, the Blackman window can be computed once, and a simple multiply used to window data as shown in the variable W ml . The Blackman Win- dow is computed using the winwts( ) function found in the DSP Pac library. The win- wts( ) function computes the weights array using the sin and cosine functions. This array is pointed to by variable w listed in the synopsis section above. The blkman( ) function is a vector function. You may therefore use the stride argu- ments i, k and h to decimate both the input and output for data congruence. For exam- ple, suppose you use winwts( ) to compute the FFT weights for a 16K FFT. This would result in an fftwts array whose length would be 16,384 points. If you were to later decide to compute an FFT of length 1,024 and run a Blackman Window on the results, you would not need to rerun the winwts( ) function to generate new weights. Simply use the old weights and stride by 16 (16,384/1024 = 16) on stride element h to obtain the correct Blackman window FFT weights . In this manner you need only compute winwts( ) once and later us them for varying length FFTs and windowing functions. The cosine arguments are held in input vector w[ ] and can be computed from the win- wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black- man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2. Note that the Blackman window has a passband ripple of 0.0017 dB, a maximum stop- band attenuation of 74 dB, and a 57 dB main lobe relative to side lobe. Wideband Computers, Inc. 5-71
18. 18. ADSP-21K Optimized DSP Library User’s Manual blkmanh ( a, i, c, k, w, h, n ) NAME Blackman-Harris Window Multiply DESCRIPTION Multiplies the input vector a[ ] by a Blackman-Harris window and stores the result to output vector c[ ]. 2πmi 4πmi 6πmi ALGORITHM C mk = A mi • 0.35875 – 0.48829 • cos ------------ + 0.14128 • cos ------------ – 0.01168 • cos ------------ - - - N N N m = { 0, 1, 2, …, n – 1 } SYNOPSIS void blkmanh ( a, i, c, k, w, h, n ) float dm *a ; /* Pointer to input vector a */ int i ; /* Element stride for vector a */ float dm *c ; /* Pointer to output vector c */ int k ; /* Element stride for vector c */ float pm *w ; /* Pointer to cosine weights array */ int h ; /* Element stride for weights array */ int n ; /* Element count for vector c */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-72 Wideband Computers, Inc.
19. 19. ADSP-21K Optimized DSP Library User’s Manualblkmanh ( a, i, c, k, w, h, n ) EXECUTION TIME 54 + 6*(N-1) cycles NOTES The file tblkmanh.c included in the distribution tape provides an example of this function’s use. For real time applications, the Blackman-Harris window can be computed once, and a simple multiply used to window data, as shown in the variable W ml . The Blackman- Harris Window is computed using the winwts( ) function found in the DSP Pac library. The winwts( ) function computes the weights array using the sin and cosine functions. This array is pointed to by variable w listed in the synopisis section above. The blkmanh function is a vector function. You may therefore use the stride argu- ments i, k and h to decimate both the input and output for data congruence. For exam- ple, suppose you use winwts( ) to compute the FFT weights for a 16K point FFT. This would result in an fftwts array whose length would be 16,384 points. If you were to later decide to compute an FFT of length 1,024 and run a Blackman-Harris Window on the results, you would not need to rerun the winwts( ) function to generate new weights. Simply use the old weights and stride by 16 (16,384/1024 = 16) on stride ele- ment h to obtain the correct window FFT weights . In this manner you need only com- pute winwts( ) once and later us them for varying length FFTs and windowing functions. The cosine arguments are held in input vector w[ ] and can be computed from the win- wts( ) function. Note that larger vector sizes of w[ ] can be used by changing the stride for w[ ]. For example, if w[ ] were computed for a window of size 2,048, but a Black- man Window of 1,024 was needed, use a stride of 2,048/1,024 = 2. Note that the Blackman-Harris window has a passband ripple of 0.0017 dB, a maxi- mum stopband attenuation of 74 dB, and a 57 dB main lobe relative to side lobe. Wideband Computers, Inc. 5-73
20. 20. ADSP-21K Optimized DSP Library User’s Manual cacort ( a, c, m, n ) NAME Complex Auto-Correlation (Time Domain) DESCRIPTION Computes the time domain auto-correlation of the complex elements stored in input vector a[ ]. Values m and n define the number of auto-correlation values to compute. The resulting auto-correlation values are stored in output complex vector c[ ]. n–i–1 ALGORITHM Ci= ∑ Ai + j • Aj i = { 0, 1, 2, …m – 1 } j=0 SYNOPSIS void cacort ( a, c, m, n ) complex dm *a ; /* Pointer to input vector a[ ] */ complex dm *c ; /* Pointer to output vector c[ ] */ int m ; /* Lag count m */ int n ; /* Number of elements in vector a[ ] */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 39 + ( 9 + 5 * n ) * n NOTES The file tacort.c included in the distribution tape provides an example of this func- tion’s use. Note that the lag count m must be less than or equal to the number of floating-point elements (i.e. m ≤ n ). The strides of vectors a[ ] and c [ ] must be 1.5-74 Wideband Computers, Inc.
21. 21. ADSP-21K Optimized DSP Library User’s Manualccdotpr ( a, i, b, j, c, k, n ) NAME Complex Dot Product Multiply by Conjugate DESCRIPTION This function computes the complex dot product of complex input vector a by the complex conjugate of input vector b and stores the results in complex output vector c. This can be alternatively expressed as C=AB*. ALGORITHM n–1 Re { C } = ∑ Re{ Ami } • Re { Bmj } + Im { Ami } • Im{ Bmj } m= 0 n–1 Im { C } = ∑ –Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj } m= 0 m = { 0, 1, 2…n – 1 } SYNOPSIS void ccdotpr ( a, i, b, j, c, k, n ) complex *a ; /* Pointer to complex input vector a */ int i ; /* Address stride in words for input vector a */ complex *b ; /* Pointer to complex input vector b */ int j ; /* Address stride in words for input vector b */ complex *c ; /* Pointer to complex output vector c */ int k ; /* Address stride in words for output vector c */ int n ; /* Element count */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits EXECUTION TIME 64 + 4*(N-1) cycles NOTES The file tccdotpr.c included in the distribution diskette provides an example of this function’s use. Wideband Computers, Inc. 5-75
22. 22. ADSP-21K Optimized DSP Library User’s Manual ccmmul ( a, b, x, y, b, z, c ) NAME Complex Matrix Multiply By Congugate of Complex Matrix DESCRIPTION This function computes the multiplication of the conjugate of complex input matrix a [ ] [ ] times the elements of complex input matrix b[ ] [ ]. The dimensions of com- plex input matrix a[ ] [ ] are x and y, while the dimensions of complex input matrix b[ ] [ ] are defined by input scalars y and z. The results are stored in complex output matrix c[ ] [ ], which is of dimensions x and z. ALGORITHM y Re ( C ij ) = ∑ [ ( Re )Aik • ( Re )Bkj + ( Im )Aik • ( Im )Bkj ] k=1 y Im(C ij ) = ∑ [ ( Re )C ik • ( Im )B kj – ( Re )B kj • ( Im )A ik ] k=1 for i = { 0, 1, …x } for j = { 0, 1, …z } SYNOPSIS void ccmmul( a, x, y, b, z, c ) complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */ int x ; /* Number of rows in complex matrix a[ ][ ] */ int y ; /* Number of columns in matrix a[ ][ ] And */ /* Number of rows in complex matrix b[ ][ ] */ complex dm *b ; /* Pointer to complex input matrix b[ ][ ] */ int z ; /* Number of columns in matrix b[ ][ ] */ complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits5-76 Wideband Computers, Inc.
23. 23. ADSP-21K Optimized DSP Library User’s Manualccmmul ( a, b, x, y, b, z, c ) EXECUTION TIME 62 + ( 6 + ( 12 + 7 * Y ) * Z ) * X cycles NOTES The file tccmmul.c included in the distribution diskette provides an example of this function’s use. a[x][y] = 1, 1 2, 2 3, 3 4, 4 5, 5 6, 6 7, 7 8, 8 9, 9 10, 10 11, 11 12, 12 1, 2 3, 4 5, 6 b[y][z] = 7, 8 9, 10 11, 12 13, 14 15, 16 17, 18 19, 20 21, 22 23, 24 x = 3, y = 4, z = 3 ; ccmmul ( a, x, y, b, z, c ) ; The resulting values in output matrix c [ ] [ ] would be as follows: c[x][y] = 270, 10 310, 10 350, 10 606, 26 610, 26 814, 26 942, 42 1110, 42 1278, 42 The storage methodology for matrices is by rows. Matrices can be thought of as one long array (vector) where the beginning of each row is offset by the number of col- umns. Wideband Computers, Inc. 5-77
24. 24. ADSP-21K Optimized DSP Library User’s Manual ccmsmul ( a, x, y, b, c ) NAME Complex Scalar-Complex Congugate Matrix Multiplication DESCRIPTION This function computes the multiplication of the conjugate of the complex input matrix a[ ] [ ] times complex input scalar b. The dimensions of complex input matrix a[ ] [ ] are x and y. The results are stored in complex output matrix c[ ] [ ], which is of dimensions x and y. ALGORITHM Cxy = B • Axy SYNOPSIS void ccmsmul( a, x, y, b, c ) complex dm *a ; /* Pointer to complex input matrix a[ ][ ] */ int x ; /* Number of rows in complex matrix a[ ][ ] */ int y ; /* Number of columns in matrix a[ ][ ] */ complex dm *b ; /* Pointer to complex input scalar b */ complex dm *c ; /* Pointer to complex output matrix c[ ][ ] */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits5-78 Wideband Computers, Inc.
25. 25. ADSP-21K Optimized DSP Library User’s Manualccmsmul ( a, x, y, b, c ) EXECUTION TIME 46 + 2 * X * Y cycles NOTES The file tccmsmul.c included in the distribution diskette provides an example of this function’s use. a[x][y] = 1, 2 3, 4 5, 6 7, 8 9, 10 11, 12 13, 14 15, 16 17, 18 19, 20 21, 22 23, 24 25, 26 27, 28 29, 30 b = {8,2} x = 8, y = 7 ; ccmsmul ( a, x, y, b, c ) ; The resulting values in output matrix c [ ] [ ] would be as follows: c[x][y] = 12, – 14 32, – 26 52, – 38 72, – 50 92, – 62 112, – 74 132, – 86 152, – 98 172, – 110 192, – 122 212, – 134 232, – 146 252, – 158 272, – 170 292, – 182 The storage methodology for matrices is by rows. Matrices can be thought of as one long array (vector) where the beginning of each row is offset by the number of col- umns. Wideband Computers, Inc. 5-79
26. 26. ADSP-21K Optimized DSP Library User’s Manual cccort ( a, b, c, m, n ) NAME Complex Cross-Correlation (Time Domain) DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements stored in complex input vectors a[ ] and b[ ]. The result is stored in complex output vector c [ ]. Values m and n define the number of cross-correlation values to compute. The implementation uses a time domain technique. n–i–1 ALGORITHM Ci = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 } j=0 SYNOPSIS void cccort ( a, b, c, m, n ) complex dm *a ; /* Pointer to input vector a[ ] */ complex dm *b ; /* Pointer to input vector b[ ] */ complex dm *c ; /* Pointer to output vector c[ ] */ int m ; /* Lag count m */ int n ; /* Number of elements in vector c[ ] */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 41 + ( 9 + 5 * n ) * m NOTES The file tcccort.c included in the distribution tape provides an example of this func- tion’s use. Note that the lag count must be less than or equal to the number of floating-point ele- ments (i.e. m ≤ n ). The strides of vectors a[ ], b[ ], and c[ ] must always be 1.5-80 Wideband Computers, Inc.
27. 27. ADSP-21K Optimized DSP Library User’s Manualccort ( a, b, c, m, n ) NAME Cross-Correlation (Time Domain) DESCRIPTION Computes the time domain (real) cross-correlation of the time domain (real) elements stored in input vectors a[ ] and b[ ]. The result is stored in output real vector c [ ]. Val- ues m and n define the number of cross-correlation values to compute. The implemen- tation uses a time domain technique. n–i–1 ALGORITHM Cm = ∑ Ai + j • Bj i = { 0, 1, 2, …, m – 1 } j=0 SYNOPSIS void ccort ( a, b, c, m, n ) float *a ; /* Pointer to input vector a[ ] */ float *b ; /* Pointer to input vector b[ ] */ float *c ; /* Pointer to output vector c[ ] */ int m ; /* Lag count m */ int n ; /* Number of elements in vector c[ ] */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 32 + 9 * M + (M+1)(2*N-M) NOTES The file tccort.c included in the distribution tape provides an example of this func- tion’s use. Note that the lag count must be less than or equal to the number of floating-point ele- ments (i.e. m ≤ n ). The strides of vectors a, b, and c must always be 1. Wideband Computers, Inc. 5-81
28. 28. ADSP-21K Optimized DSP Library User’s Manual cdesamp ( data, coeff, output, d, n, p ) NAME Complex Decimating Finite Impulse Response (FIR) Filter DESCRIPTION The function computes the convolution of complex vectors data [ ] and coeff [ ] plac- ing the results in complex vector output [ ]. The number of output samples n and the number of coefficients p may be dissimilar. n elements will be written to output [ ]. Complex vector data [ ] represents the real and imaginary (I and Q) components of the input data respectively. Likewise, complex vector coeff [ ] represents the real and imaginary ( I and Q) components of the coefficient data. A complex multiply and add is performed to compute the convolutional output. The decimation factor d is used to stride the next starting point in data [ ]. p–1 ALGORITHM Output [ i ] = ∑ data [ i • d + j ] • coeff [ p – j – 1 ] j=0 i = { 0, 1, 2…n – 1 } SYNOPSIS void cdesamp ( data, coeff, output, d, n, p ) complex dm *data ; /* Complex input data ( len n+p-1 ) */ complex pm *coeff ; /* Complex coefficients ( len p ) */ complex dm *output ; /* Complex output data ( len n ) */ int d ; /* Decimation factor */ int n ; /* Number of output samples */ int p ; /* Number of coefficients */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-82 Wideband Computers, Inc.
29. 29. ADSP-21K Optimized DSP Library User’s Manualcdesamp ( data, coeff, output, d, n, p ) EXECUTION TIME 36 + ( 7 + 5 * p ) * n cycles NOTES The file tcdesamp.c included in the distribution tape provides an example of this func- tion’s use. The number of filter output samples to generate can be obtained as follows: n = ( ndata – p ) ⁄ d + 1 where ndata is the number of elements in data[ ]. A complex correlation can be performed by reversing the order of the coefficients vec- tor. Wideband Computers, Inc. 5-83
30. 30. ADSP-21K Optimized DSP Library User’s Manual cdotpr ( a, i, b, j, c, k, n ) NAME Complex Dot Product DESCRIPTION This function computes the complex dot product of complex input vector a and com- plex input vector b and stores the results in complex output vector c. This can altena- tively thought of as C = A • B . ALGORITHM n–1 Re { C } = ∑ Re { Ami } • Re { Bmj } – Im{ Ami } • Im{ Bmj } m= 0 n–1 Im { C } = ∑ Re{ Ami } • Im { Bmj } + Im { Ami } • Re { Bmj } m= 0 m = { 0, 1, 2…n – 1 } SYNOPSIS void cdotpr ( a, i, b, j, c, k, n ) complex *a ; /* Pointer to complex input vector a */ int i ; /* Address stride in words for input vector a */ complex *b ; /* Pointer to complex input vector b */ int j ; /* Address stride in words for input vector b */ complex *c ; /* Pointer to complex output vector c */ int k ; /* Address stride in words for output vector c */ int n ; /* Element count */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits EXECUTION TIME 64 + 4*(N-1) cycles NOTES The file tcdotpr.c included in the distribution diskette provides an example of this function’s use.5-84 Wideband Computers, Inc.
31. 31. ADSP-21K Optimized DSP Library User’s Manualceil_wci ( x ) NAME Round Up to Nearest Integer DESCRIPTION This function computes the smallest integral value greater than or equal to the float- ing-point number x. A floating-point representation of this integer value is returned. ALGORITHM return = smallest int ≥ x SYNOPSIS float ceil_wci ( float x ) DOMAIN -3.4E+38 to 3.40E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 20 cycles NOTES The file tceil.c included in the distribution tape provides an example of this functions use. Wideband Computers, Inc. 5-85
32. 32. ADSP-21K Optimized DSP Library User’s Manual cfft ( xr, xi, wr, wi, wstr, yr, yi, n ) NAME Fast Fourier Transform Of Complex Input Data DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in com- plex input vector a. The results are stored in complex output vector c. n–1 – i2πmk ⁄ n ALGORITHM Cm = ∑ Ake m = { 0, 1, 2, …, n – 1 } k=0 SYNOPSIS void cfft ( xr, xi, wr, wi, wstr, yr, yi, n ) float dm *xr ; /* Pointer to real input data */ float dm *xi ; /* Pointer to imaginary input data */ float pm *wr ; /* Pointer to cosine table */ float dm *wi ; /* Pointer to sine table */ int wstr ; /* Cosine/sine table stride */ float dm *yr ; /* Pointer to real output data */ float pm *yi ; /* Pointer to imaginary output data */ int n ; /* FFT Size (In Complex Elements) */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME See Attached Table Below5-86 Wideband Computers, Inc.
33. 33. ADSP-21K Optimized DSP Library User’s Manualcfft ( xr, xi, wr, wi, wstr, yr, yi, n ) NOTES This is a radix-2 Fast Fourier Transform using parallel data memory/program memory data accesses to maximize the throughput on the 21020/60/62 processor. The complex input data is separated into real and imaginary parts, xr and xi. These vectors must be aligned on an address which is an integer multiple of the FFT size, as required for 21K bit-reverse addressing. The input vectors are both in data memory; the imagi- nary data is bit-reversed into program memory at the beginning of the routine. The number of elements n supplied to the algorithm must be an integral power of two and a minimum of 32. The complex output is separated into real and imaginary parts, yr and yi. These vec- tors may have arbitrary address alignment; however yr is in the data memory and yi is in the program memory. Vectors xr and xi must be in data memory and each must be aligned to an integral multiple of n. Vectors wr and wi are in program memory and data memory respectively and are given the values: wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )program memory wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 )data memory The weight stride wst allows cfft() to be called with varying sizes n from a single set of weights.These weights are generated using the fftwts() function. This precomputed FFT weight approach was implemented in order to ensure accurate results and boost the available cfft() dynamic range to approximately 130 dB for longer length (>16K) FFTs. This is accomplished by using an implementation that does not rely on a recursive call to a sin/cosine approximation routine, as found in other implementations. Rather, the FFT weights are precomputed accurately using the fftwts() function. This is sufficient for A/D converters with bit lengths up to 22 bits. The number of elements n must be an integral power of two and a minimum of 32. Vector yr is in data memory and has a minimum size of n. Vector yi is in program memory and has a minimum size of n. The file tcfft.c included in the distribution tape provides an example of this function’s use. Wideband Computers, Inc. 5-87
34. 34. ADSP-21K Optimized DSP Library User’s Manual cfft ( xr, xi, wr, wi, wstr, yr, yi, n ) SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter- rupt service routines (ISRs), when used in conjunction with the FFT routines ( cfft( ), cffti( ), rfft( ), rffti( ) ). Observations related to the Wideband technical staff typically include a description of the Wideband routine executing perfectly, but unable to return to an exact state after being interrupted by the ISR ( what is described as a “tumble into the weeds.” ) The Wideband Fast Fourier transforms, both complex and real, forward and inverse, use the built-in bit reversing and circular addressing capabilites of the SHARC archi- tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal circular addressing capabilities. End users are usually cognizant that their ISR calling routine is responsible for saving and restoring the registers of the Wideband routines. However, end users sometimes forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with bir reversing and the B ( base ) and L ( length ) registers associated with circular addressing. In such circumstances where they are not saved and restored by the ISR they are unable to return the proper length parameter ( L Register ) used for circular addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results in the strange manefestations users sometimes report. To properly save and restore the above mentioned registers in an ISR, refer to page 4- 21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31- 000005-08, dated August 95) which references examples of in line assembly code within C code to save and restore registers. For a detailed review of the relationships between the various FFT functions and how to use them with one another, see the final section of Chapter 4.5-88 Wideband Computers, Inc.
35. 35. ADSP-21K Optimized DSP Library User’s ManualPerformance IssuesThe inital timing shown below for the 32 point to 4,096 point FFTs were timed using theAnalog Devices simulator.Performance Timings For Complex FFTs Number of Points Processor Cycles 8 See cfft8( ) function 16 See cfft16( ) function 32 771 Cycles 64 1,274 Cycles 128 2,368 Cycles 256 4,724 Cycles 512 10,060 Cycles 1,024 21,618 Cycles 2,048 46,744 Cycles 4,096 101,054 Cycles 8,192 217,828 Cycles 16,384 467,722 Cycles 32,768 1,000,240 Cycles 65,536 2,130,774 Cycles Wideband Computers, Inc. 5-89
36. 36. ADSP-21K Optimized DSP Library User’s Manual cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n ) NAME Complex 2-Dimensional Fast Fourier Transform DESCRIPTION Computes a 2-Dimensional Fast Fourier Transform of the complex input elements stored in vector a[ ]. The results are stored in complex output vector c[ ]. n–1n–1 – 2 πj ( ( r ⋅ R + c ⋅ C ) ⁄ n ) ALGORITHM C r, c = ∑ ∑ Ake r = 0c = 0 R = { 0, 1, …n – 1 } C = { 0, 1, …n – 1 } SYNOPSIS void cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n ) float dm *xr ; /* Pointer to real input/output data */ float dm *xi ; /* Pointer to imaginary input/output data */ float pm *wr ; /* Pointer to cosine table */ float dm *wi ; /* Pointer to sine table */ int wstr ; /* Consine/sine Table table */ float dm *tmpdm ; /* Pointer to real output data */ float pm *tmppm ; /* Pointer to imag output data */ int n ; /* CFFT2D Size (Complex Elements n x n) */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits5-90 Wideband Computers, Inc.
37. 37. ADSP-21K Optimized DSP Library User’s Manualcfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n ) EXECUTION TIME 32 x 32 Pts. 44,532 cycles 64 x 64 Pts. 165,364 cycles 128 x 128 Pts. 659,572 cycles Wideband Computers, Inc. 5-91
38. 38. ADSP-21K Optimized DSP Library User’s Manual cfft2d ( xr, xi, wr, wi, wstr, tmpdm, tmppm, n ) NOTES The input data is an nxn complex matric x separated into real and imaginary parts xr and xi stored as follows: Re ( x r, c ) = xr [ r • n + c ] r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 } Im ( x r, c ) = xi [ r • n + c ] r = { 0, 1, …, n – 1 } c = { 0, 1, …, n – 1 } Variables r and c are the row and column numbers. The DFT output replaces the input, and is stored as follows: Re ( F R, C ) = xr [ R • n + C ] R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 } Im ( F R, C ) = xi [ R • n + C ] R = { 0, 1, …, n – 1 } C = { 0, 1, …, n – 1 } A radix-2 Fast Fourier Transform (FFT) algorithm is used to compute the individual row and column DFTs. The number of elements n must be an integral power of two and a minimum of 32. Vectors xr and xi must be in data memory and are adress-aligned to an integral multi- ple of n. Vectors wr and wi must be in program memory and data memory respectively and are pre-computed to be: wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 ) wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 ) Vector tmpdm must be in data memory, having a minimum size of n, and be address- aligned to an integral multiple of n. Vector tmppm must be in program memory and have a minimum size of n,and be address-aligned to an integral multiple of n. The file tcfft2d.c included in the distribution tape provides an example of this func- tion’s use.5-92 Wideband Computers, Inc.
39. 39. ADSP-21K Optimized DSP Library User’s Manualcfft8 ( xr, xi, yr, yi ) NAME 8-Point Complex Fast Fourier Transform (Inline) DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input vector xr and xi. The results are stored in output vector yr and yi. ALGORITHM 7 – 2πj ( m • k ⁄ 8 ) Ym = ∑ Xke m = { 0, 1, 2, …, 7 } k=0 SYNOPSIS void cfft8 ( xr, xi, yr, yi ) float dm *xr ; /* Pointer to real input data */ float dm *xi ; /* Pointer to imaginary input data */ float dm *yr ; /* Pointer to real output data */ float pm *yi ; /* Pointer to imaginary output data */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits Wideband Computers, Inc. 5-93
40. 40. ADSP-21K Optimized DSP Library User’s Manual cfft8 ( xr, xi, yr, yi ) EXECUTION TIME 184 Cycles NOTES This is an 8-point radix-2 Fast Fourier Transform using parallel data memory/program memory data accesses to maximize the throughput on the 21020/60/62 processor. The complex input data is separated into real and imaginary parts, xr and xi. These vectors must be aligned on an address which is an integer multiple of the FFT size, as required for 21K bit-reverse addressing. The input vectors are both in data memory; the imaginary data is bit-reversed into program memory at the beginning of the routine. This algorithm utilizies a decimation in time approach. As the cffti( ) function requires a minimum of 32-points as input, there is no corresponding inverse algorithm for this routine. The complex output is separated into real and imaginary parts, yr and yi. These vectors may have arbitrary address alignment; however yr is in the data mem- ory and yi is in the program memory. •Vectors xr and xi are defined in cfft8dta.asm using the dm_align segment to ensure address alignment. For a detailed review of the relationships between the various FFT functions and how to use them with one another, see the final section of Chapter 4. The file tcfft8.c included in the distribution tape provides an example of this func- tion’s use.5-94 Wideband Computers, Inc.
41. 41. ADSP-21K Optimized DSP Library User’s Manualcfft16 ( xr, xi, yr, yi ) NAME 16-Point Complex Fast Fourier Transform (Inline) DESCRIPTION Computes the Fast Fourier Transform of the complex input elements stored in input vector xr and xi. The results are stored in output vector yr and yi. 15 ∑ Xke ALGORITHM – 2πj16 Ym = m = { 0, 1, 2, …, 15 } k=0 SYNOPSIS void cfft16 ( xr, xi, yr, yi ) float dm *xr ; /* Pointer to real input data */ float dm *xi ; /* Pointer to imaginary input data */ float dm *yr ; /* Pointer to real output data */ float pm *yi ; /* Pointer to imaginary output data */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits Wideband Computers, Inc. 5-95
42. 42. ADSP-21K Optimized DSP Library User’s Manual cfft16 ( xr, xi, yr, yi ) EXECUTION TIME 388 Cycles NOTES This is an 16-point radix-2 Fast Fourier Transform using parallel data memory/pro- gram memory data accesses to maximize the throughput on the 21020/60/62 proces- sor. The complex input data is separated into real and imaginary parts, xr and xi. These vectors must be aligned on an address which is an integer multiple of the FFT size, as required for 21K bit-reverse addressing. The input vectors are both in data memory; the imaginary data is bit-reversed into program memory at the beginning of the routine. This algorithm utilizies a decimation in time approach. As the cffti( ) function requires a minimum of 32-points as input, there is no corresponding inverse algorithm for this routine. The complex output is separated into real and imaginary parts, yr and yi. These vectors may have arbitrary address alignment; however yr is in the data mem- ory and yi is in the program memory. •Vectors xr and xi are defined in cfft16dt.asm using the dm_align segment to ensure address alignment. For a detailed review of the relationships between the various FFT functions and how to use them with one another, see the final section of Chapter 4. The file tcfft16.c included in the distribution tape provides an example of this func- tion’s use.5-96 Wideband Computers, Inc.
43. 43. ADSP-21K Optimized DSP Library User’s Manualcffti ( xr, xi, wr, wi, wstr, yr, yi, n ) NAME Inverse Complex FFT DESCRIPTION Computes the Inverse Fast Fourier Transform of the input elements stored in vectors xr and xi. The results are stored in complex output vector c. Note the Inverse FFT is the same as the Forward FFT except that the sign of the imaginary components of the twiddle factors is negated. The Inverse FFT swaps the real and imaginary input data, perform the Forward FFT with the same weights table, and swaps the real and imagi- nary ouptut data. Scaling by 1/N is then performed. n–1 i2πmk ⁄ n ∑ Ak e ALGORITHM 1 C m = -- - m = { 0, 1, 2, …, n – 1 } n k=0 SYNOPSIS void cffti ( xr, xi, wr, wi, wstr, yr, yi, n ) float dm *xr ; /* Pointer to real input data */ float dm *xi ; /* Pointer to imaginary input data */ float pm *wr ; /* Pointer to cosine table */ float dm *wi ; /* Pointer to sine table */ int wstr ; /* Cosine/sine table stride */ float dm *yr ; /* Pointer to real output data */ float pm *yi ; /* Pointer to imaginary output data */ int n ; /* FFT Size (In Complex Elements) */ DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 22,650 Cycles @ 1,024 Points - Data and Program In On-Board Cache Wideband Computers, Inc. 5-97
44. 44. ADSP-21K Optimized DSP Library User’s Manual cffti ( xr, xi, wr, wi, wstr, yr, yi, n ) NOTES This is a radix-2 inverse Fast Fourier Transform using parallel DM/PM data accesses to maximize the throughput on the 21020 processor.The complex input data is sepa- rated into real and imaginary parts, xr and xi. These vectors must be aligned on an address which is an integer multiple of the FFT size, as required for 21K bit- reverse addressing. The input vectors are both in DM; the imaginary data is bit- reversed into PM at the beginning of the routine.The number of elements n must be an integral power of two and a minimum of 32. The complex output is separated into real and imaginary parts, yr and yi. These vec- tors may have arbitrary address alignment; however yr is in the DM and yi is in the PM. Vectors xr and xi mus be in data memory and each must be aligned to an integral multiple of n. Vectors wr and wi are in program memory and data memory respectively and are given the values: wr [ k ] = cos [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 ) wi [ k ] = sin [ 2πk ⁄ wst*n ] k = ( 0, 1, …, wstn ⁄ 2 – 1 ) The weight stride, wst, allows for calling cfft() with varying sizes n from a single set of weights. These weights are generated using the fftwts( ) function. Vector yr is in data memory and has a minimum size of n.Vector yi is in program memory and has a minimum size of n. The file tcfft.c included in the distribution tape provides an example of this function’s use.5-98 Wideband Computers, Inc.
45. 45. ADSP-21K Optimized DSP Library User’s Manualcffti ( xr, xi, wr, wi, wstr, yr, yi, n ) SPECIAL NOTES Previous users have sometimes reported problems associated with implementing inter- rupt service routines (ISRs), when used in conjunction with the FFT routines (cfft ( ), cffti ( ), rfft ( ), rffti ( ) ). Observations related to the Wideband technical staff typi- cally include a description of the Wideband routine executing perfectly, but unable to return to an exact state after being interrupted by the ISR ( a “tumble into the weeds.” ) The Wideband Fast Fourier transforms, both complex and real, forward and inverse, use the built-in bit reversing and circular addressing capabilites of the SHARC archi- tecture. Also, other routines such as some of the FIR filters use the SHARC’s internal circular addressing capabilities. End users are usually cognizant that their ISR calling routine is responsible for saving and restoring the registers of the Wideband routines. However, end users sometimes forget to save and restore ( push and pop ) the mode 1 regiser, which is associated with bir reversing and the B ( base ) and L ( length ) registers associated with circular addressing. In such circumstances where they are not saved and restored by the ISR they are unable to return the proper length parameter ( L Register ) used for circular addressing or the proper mode ( Mode 1 Register ) used in Bit Reversing. This results in the strange manefestations users sometimes report. To properly save and restore the above mentioned registers in an ISR, refer to page 4- 21, section 4.3 of the Analog Devices ADSP-21000 Family C Tools Manual (#31- 000005-08, dated August 95) which references examples of in line assembly code within C code to save and restore registers. For a detailed review of the relationships between the various FFT functions and how to use them with one another, see the final section of Chapter 4. Wideband Computers, Inc. 5-99
46. 46. ADSP-21K Optimized DSP Library User’s ManualTABLE 8 Table of Inverse Complex FFT Timing Number Processor of Points Cycles 32 868 Cycles 64 1,435 Cycles 128 2,657 Cycles 256 5,319 Cycles 512 11,117 Cycles 1,024 23,699 Cycles 2,048 50,873 Cycles 4,096 109,281 Cycles 8,192 234,244 Cycles 16,384 500,525 Cycles 32,768 1.072,560 Cycles 65,536 2,288,128 Cycles5-100 Wideband Computers, Inc.
47. 47. ADSP-21K Optimized DSP Library User’s Manualcfir ( ii, qq, ci, cq, oi, oq, d, n, p ) NAME Complex Finite Impulse Response Filter DESCRIPTION The function cfir( ) computes the convolution of vectors ii[ ], iq[ ], ci[ ], and cq[ ] placing the results in oi[ ] and oq[ ] respectively. The number of output samples n and the number of coefficients p may be dissimilar. n elements will be writtento oi[ ] and oq[]. The vectors ii[ ] and iq[ ] represent the real and imaginary (I and Q) components of the input data respectively. Likewise,the vectors ci[ ] and cq[ ] represent the real and imaginary (I and Q) components of the coefficient data. A complex multiply and add is performed to compute the convolutional output. The decimation factor d is used to stride the next starting ii[ ] and iq[ ] data. p= 1 ALGORITHM C[ i ]= ∑ a[a • d + j] • b[p – j – 1] j=0 m = { 0, 1, 2, …, n – 1 } where a [ ] compromises complex components ii [ ] and iq [ ] b [ ] compromises complex components ci [ ] and cq [ ] c [ ] compromises complex components oi [ ] and oq [ ] SYNOPSIS void cfir ( ii, qq, ci, cq, oi, oq, d, n, p ) */ float dm *ii ; Input samples for I data ( len n+p-1 ) */ */ float dm *iq ; Input samples for Q data ( len n+p-1 ) */ */ float pm *ci ; Coefficients for I data ( len p ) */ */ float pm *cq ; Coefficients for Q data ( len p ) */ */ float dm *oi ; Output samples for I data ( len n ) */ */ float dm *oq ; Output samples for Q data ( len n ) */ */ int d ; Decimation factor */ */ int n ; Number of output samples */ */ int p ; Number of coefficients */ Wideband Computers, Inc. 5-101
48. 48. ADSP-21K Optimized DSP Library User’s Manual cfir ( ii, qq, ci, cq, oi, oq, d, n, p ) DOMAIN -3.4E+38 to 3.4E+38 ACCURACY 7.75 decimal digits EXECUTION TIME 59 + ( 9 + 5 * p ) * n cycles NOTES The file tfir.c included in the distribution tape provides an example of this function’s use. The number of filter output samples to generate can be obtainted as follows: ( ndata – p ) n = ---------------------------- d+1 where ndata is the number of elements in ii[ ] and iq[ ]. A correlation can be performed by reversing the order of the coefficients vector.5-102 Wideband Computers, Inc.
49. 49. ADSP-21K Optimized DSP Library User’s Manualchksum ( a, i, type, n ) NAME Perform Checksum DESCRIPTION This function performs a checksum on a memory block. The memory block is defined by the start address a offset by n. The type flag determines whether dm or pm memory is tested ( 1 = dm, 0 = pm). ALGORITHM Return ⇐ Checksum SYNOPSIS void chksum ( a, i, type, n ) int a ; /* Start address of memory */ int i ; /* Memory Stride */ int type ; /* Type of memory to test ( dm or pm ) */ int n ; /* Length of block to be checked */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits EXECUTION TIME 17 + 2 * N cycles NOTES The file tchksum.c included in the distribution diskette provides an example of this function’s use. chksum( ) performs a two’s complement on the sum of the elements within the mem- ory block. The check sum value is returned. Wideband Computers, Inc. 5-103
50. 50. ADSP-21K Optimized DSP Library User’s Manual cmadd ( a, b, x, y, c ) NAME Complex Matrix Addition DESCRIPTION This function computes the addition of complex input matrix a with complex input matrix b and stores the results to complex output matrix c. ALGORITHM C ri11 C ri12 C ri13 A ri11 A ri12 A ri13 B ri11 B ri12 B ri13 = + C ri21 C ri22 C ri23 A ri21 A ri22 A ri23 B ri21 B ri22 B ri23 where ri indicates a real and imaginary component SYNOPSIS void cmadd ( a, b, x, y, c ) complex dm *a ; /* Pointer to input matrix a [ ][ ] */ complex dm *b ; /* Pointer to input matrix b [ ][ ] */ int x ; /* Number of rows in matrix a[ ][ ] & b[ ][ ] */ int y ; /* Number of columns in matrix a[ ][ ]& b[ ][ ]*/ complex dm *c ; /* Pointer to output matrix c [ ][ ] */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits5-104 Wideband Computers, Inc.
51. 51. ADSP-21K Optimized DSP Library User’s Manualcmadd ( a, b, x, y, c ) EXECUTION TIME 32 + 3*X*Y cycles NOTES The file tcmadd.c included in the distribution diskette provides an example of this function’s use. The addition of a complex matrix is mathematically expressed as follows: Real C [ x ] [ y ] = A [ x ] [ y ] Real + B [ x ] [ y ] Real Imaginary C [ x ] [ y ] = A [ x ] [ y ] Imaginary + B [ x ] [ y ] Imaginary An example of the additon of one complex matrix to another is as follows: 1, 2 3.8, 1.7 8.8, 5.5 9.9, 14 7.1, 5 9.3, 1.6 0.4, 1 51, 3.3 0.9, 1 8, 5 2.1, 6 – 3.1, – 1 A[ x][ y]= 9.3, 1 2.5, 1.5 6.9, 9 10, 22.1 1.3, 1.4 0.2, 4.5 0.9, 51.4 1.5, 4.4 9.2, 4 7.8, 1.7 61, 3.4 14.3, 1.4 3.2, 1 8.8, 2 9.9, 3 44.3, 13.3 8.1, 4 6.5, 5 3.2, 6 – 2.3, – 9.9 8.9, 7 2.8, 8 1.7, 9 – 8.1, – 2.2 B [x] [y] = 6.4, 10 11, 1.3 12, 4.5 22.9, – 5.4 6.5, 7 2.1, 8 2.2, 9 32, 9.8 1.1, 4 7.7, 5 4.4, 6 – 2.1, – 0.3 x=6, y=4 4.2, 3 12.6, 3.7 18.7, 8.5 54.2, 27.3 15.2, 9 15.8, 6.6 3.6, 7 48.7, – 6.6 9.8, 8 10.8, 13 3.8, 15 – 11.2, – 3.2 C [x] [y] = 15.7, 11 13.5, 2.8 18.9, 13.5 32.9, 16.7 7.8, 8.4 2.3, 12.5 3.1, 60.4 33.5, 14.2 10.3, 8 15.5, 6.7 65.4, 9.4 12.2, 1.1 Wideband Computers, Inc. 5-105
52. 52. ADSP-21K Optimized DSP Library User’s Manual cmmov ( a, x, y, b ) NAME Complex Matrix Move DESCRIPTION This function moves a source complex input matrix a to a destination complex output matrix b. ALGORITHM C ri11 C ri12 C ri13 A ri11 A ri12 A ri13 ⇐ C ri21 C ri22 C ri23 A ri21 A ri22 A ri23 where ri indicates a real and imaginary component SYNOPSIS void cmmov ( a, x, y, b ) complex dm *a ; /* Pointer to input matrix a [ ][ ] */ int x ; /* Number of rows in matrix a[ ][ ] */ int y ; /* Number of columns in matrix a[ ][ ] */ complex dm *b ; /* Pointer to output matrix b [ ][ ] */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits EXECUTION TIME 13 + ( 2 * X * Y ) cycles NOTES The file tcmmov.c included in the distribution diskette provides an example of this function’s use. The storage methodology for matrices is by rows. Matricies can be thought of as one long array (vector) where the beginning of each row is offset by the length of the col- umn.5-106 Wideband Computers, Inc.
53. 53. ADSP-21K Optimized DSP Library User’s Manualcmmul ( a, x, y, b, z, c ) NAME Complex Matrix Multiplication DESCRIPTION This function computes the multiplication of complex input matrix a times complex input matrix b and stores the results to complex output matrix c. The dimension of complex matrix a [ ] [ ] is x and y and the dimension of complex input matrix b [ ] [ ] is y and z. The resulting complex output matrix c [ ] [ ] is of dimension x and z. ALGORITHM B ri11 B ri12 C ri11 C ri12 A ri11 A ri12 A ri13 = • B ri21 B ri22 C ri21 C ri22 A ri21 A ri22 A ri23 B ri31 B ri32 where ri indicates a real and imaginary component SYNOPSIS void cmmul ( a, x, y, b, z, c ) complex dm *a ; /* Pointer to input matrix a [ ][ ] */ int x ; /* Number of rows in matrix a[ ][ ] */ int y ; /* Number of columns in matrix a[ ][ ] */ /* Number of rows in matrix b[ ][ ] */ complex dm *b ; /* Pointer to input matrix b [ ][ ] */ int z ; /* Number of columns in matrix b[ ][ ] */ complex dm *c ; /* Pointer to output matrix c [ ][ ] */ DOMAIN -3.4 x 1038 to +3.4 x 1038 ACCURACY 7.75 decimal digits Wideband Computers, Inc. 5-107
54. 54. ADSP-21K Optimized DSP Library User’s Manual cmmul ( a, x, y, b, z, c ) EXECUTION TIME 45 + (4 + ( 10 + 5 * Y) * Z) * X cycles NOTES The file tcmmul.c included in the distribution diskette provides an example of this function’s use. The multiplication of a complex matrix is as follows: y C[x ][y ]= ∑ ( Real Sum + Imaginary Sum ) where k=1 Real Sum = A Real • BReal – AImaginary • BImagainary Imaginary Sum = A Real • BImaginary + BReal • A Imaginary The storage methodology for matrices is by rows. Matrices can be thought of as one long array (vector) where the beginning of each row is offset by the length of the col- umn. The first row of a [ ] [ ] times the first column of b [ ] [ ] is the first element of c [ ] [ ] (row 1, column 1). The first row of a [ ] [ ] times the second row of b [ ] [ ] is the sec- ond element of c [ ] [ ] (row 1, column 2 ) ... etc. This algorithm follows the general law of matrix multiplication whereby the number of columns of input matrix a must equal the number of rows of input matrix b. ri indicates that each component of the matrix is composed of a complex number which has both a real and imaginary component. An example of the multipication of one complex matrices by another is as follows: 1, 2 3, 4 5, 6 1, 1 2, 2 3, 3 4, 4 7, 8 9, 10 11, 12 A [ x ] [ y ] = 5, 5 6, 6 7, 7 8, 8 B [y] [z] = 13, 14 15, 16 17, 18 9, 9 10, 10 11, 11 12, 12 19, 20 21, 22 23, 24 x=3, y=4, z=3 – 10, 270 – 10, 310 – 10, 350 C [ ] = – 26, 606 – 26, 710 – 26, 814 – 42, 942 – 42, 1110 – 42, 12785-108 Wideband Computers, Inc.