SlideShare a Scribd company logo
1 of 53
SPIRAL: Current Status
Markus Püschel
Students
Faculty
• José Moura (CMU)
• Jeremy Johnson (Drexel)
• Robert Johnson (MathStar)
• David Padua (UIUC)
• Viktor Prasanna (USC)
• Markus Püschel (CMU)
• Manuela Veloso (CMU)

• Gavin Haentjens (CMU)
• Pinit Kumhom (Drexel)
• Neungsoo Park (USC)
• David Sepiashvili (CMU)
• Bryan Singer (CMU)
• Yevgen Voronenko (Drexel)
• Edward Wertz (CMU)
• Jianxin Xiong (UIUC)
Collaborators
• Christoph Überhuber (TU Vienna)
• Franz Franchetti (TU Vienna)

http://www.ece.cmu.edu/~spiral
Sponsor
Work supported by DARPA (DSO), Applied & Computational
Mathematics Program, OPAL, through grant managed by
research grant DABT63-98-1-0004 administered by the Army
Directorate of Contracting.
Moore’s Law

and High(est) Performance
Scientific Computing
(single processor, off-the-shelf)
Moore’s Law:

 processor-memory bottleneck
 short life cycles of computers
 very complex architectures
• vendor specific
• special instructions (MMX, SSE, FMA, …)
• undocumented features

Consequences for software/algorithms:
 arithmetic cost model not accurate for predicting runtime
 better performance models hard to get
 best code is machine dependent (registers/caches size, structure)
 hand-tuned code becomes obsolete as fast as it is written
 compiler limitations
 full performance requires (in part) assembly coding

Portable performance requires automation
SPIRAL
Automates
Implementation

 cuts development costs
 code less error-prone

Optimization

 systematic exploration of alternatives both at
algorithmic and code level

Platform-Adaptation

 takes advantage of architecture specific features
 porting without loss of performance

of DSP algorithms

 are performance critical

A library generator for highly optimized
signal processing algorithms
SPIRAL system

goes for a coffee

ian
tic
a

controls
algorithm generation

Formula Generator

fast algorithm
as SPL formula

ert
Exp m
SPL Compiler er
ram
rog
P

controls
implementation options

C/Fortran/SIMD
code

platform-adapted
implementation

Search Engine

hem
at
M

specifies

runtime on given platform

comes back

(or an espresso for small transforms)

SPIRAL

DSP transform

user
DSP Transform

Algorithm
DSP Algorithms: Example 4-point DFT
Cooley/Tukey FFT (size 4):

1 1 1 1  1
1 i − 1 − i  0

=
1 − 1 1 − 1 1

 
1 − i − 1 i  0

Fourier transform

0 1 0  1
1 0 1  0

0 − 1 0  0

1 0 − 1 0

0
1
0
0

0
0
1
0

0 1 1
0 1 − 1

0  0 0

i  0 0

0 0  1
0 0  0

1 1  0

1 − 1 0

0
0
1
0

0
1
0
0

Diagonal matrix (twiddles)

DFT4 = ( DFT2 ⊗ I 2 ) ⋅ T24 ⋅ ( I 2 ⊗ DFT2 ) ⋅ L4
2
Kronecker product

Identity

Permutation

 product of structured sparse matrices
 mathematical notation

0
0

0

1
DSP Algorithms: Terminology
Transform

DFTn

Rule

DFTnm → ( DFTn ⊗ I m ) ⋅ D ⋅ ( I n ⊗ DFTm ) ⋅ P

parameterized matrix

• a breakdown strategy
• product of sparse matrices

DFT 8

Ruletree
DFT 2

DFT 4
DFT 2

Formula

DFT 2

• recursive application of rules
• uniquely defines an algorithm
• efficient representation
• easy manipulation

DFT8 = ( F2 ⊗ I 4 ) ⋅ D ⋅ ( I 2 ⊗ ( I 2 ⊗ F2 ) ) ⋅ P
• few constructs and primitives
• uniquely defines an algorithm
• can be translated into code
DSP Transforms
discrete Fourier transform
Walsh-Hadamard transform

DFTn = [ exp( 2kliπ / n ) ]
WHT2k = DFT2 ⊗ ⊗ DFT2
DCT ( II ) n = [ cos( k ( l + 1 / 2 )π / n ) ]

DCT ( IV ) n = [ cos( ( k + 1 / 2) (l + 1 / 2)π / n ) ]

discrete cosine and sine
Transforms (16 types)
modified discrete
cosine transform

DST ( I ) n = [ sin ( klπ / n ) ]
MDCTn×2 n = [ cos( ( k + ( n + 1) / 2)(l +1 / 2)π / n ) ]

two-dimensional transform

T ⊗T

Others: filters, discrete wavelet transforms, Haar, Hartley, …
Rules = Breakdown Strategies

(

)

DCT2( II ) → diag 1,1 / 2 ⋅ F2

(

)

DCTn( II ) → P ⋅ DCTn(/II ) ⊕ DCTn(/IV ) ⋅ ( I n / 2 ⊗ F2 )
2
2

base case
Q

recursive

DCTn( IV ) → S ⋅ DCTn( II ) ⋅ D

translation

DCTn( IV ) → M 1  M r
DFTn → B ⋅ ( DCTn(/I2) ⊕ DSTn(/I2) ) ⋅ C

iterative
recursive

DFTnm → ( DFTn ⊗ I m ) ⋅ D ⋅ ( I n ⊗ DFTm ) ⋅ P

recursive

Fn (h) → ( I n / d ⊗ k I d + k ) ⋅ ( I n / d ⊗ Fd (h))

recursive

Fn (h) → Circ (h ) ⋅ E

recursive

DWTn (W ) → ( DWTn / 2 (W ) ⊕ I n / 2 ) ⋅ P ⋅ ( I n / 2 ⊗ k W ) ⋅ E

recursive

WHT2n → ∏ ( I 2n1+K+ni −1 ⊗ WHT2ni ⊗ I 2ni +1+K+nt )

iterative/
recursive

MDCTn×2 n → S ⋅ DCTn( IV ) ⋅ P

translation

n

i =1
Formula for a DCT, size 16
DSP Transform

Algorithm (Formula)

Implementation
Formulas in SPL
••••
( compose
( diagonal ( 2*cos(1/16*pi) 2*cos(3/16*pi) 2*cos(5/16*pi) 2*cos(7/16*pi) ) )
( permutation ( 1 3 4 2 ) )
( tensor
( I 2 )
( F 2 )
)
( permutation ( 1 4 2 3 ) )
( direct_sum
( compose
( F 2 )
( diagonal ( 1 sqrt(1/2) ) )
)
( compose
( matrix
( 1 1 0 )
( 0 (-1) 1 )
)
( diagonal ( cos(13/8*pi)-sin(13/8*pi) sin(13/8*pi) cos(13/8*pi)+sin(13/8*pi) ) )
( matrix
( 1 0 )
( 1 1 )
( 0 1 )
)
( permutation ( 2 1 ) )

••••
SPL Syntax (Subset)
 matrix operations:
(compose formula formula ...)
(tensor formula formula ...)
(direct_sum formula formula ...)
 direct matrix description:
(matrix (a11 a12 ...) (a21 a22 ...) ...)
(diagonal (d1 d2 ...))
(permutation (p1 p2 ...))
 parameterized matrices:
(I n)
(F n)
 scalars:
1.5, 2/7, cos(..), w(3), pi, 1.2e-04
allows extension of SPL
 definition of new symbols:
(define name formula)
(template formula (i-code-list)
 directives for code generation
#codetype real/complex
controls loop unrolling
#unroll on/off
SPL Compiler, 4-point FFT
(compose (tensor (F 2) (I 2)) (T 4 2)
(tensor (I 2) (F 2)) (L 4 2))

fast algorithm
as
formula
as
SPL program

#codetype
complex
f0
f1
f2
f3
f4
y(1)
y(2)
y(3)
y(4)

=
=
=
=
=
=
=
=
=

x(1) + x(3)
x(1) - x(3)
x(2) + x(4)
x(2) - x(4)
(0.00d0,-1.00d0)*f(3)
f0 + f2
f0 - f2
f1 + f4
f1 - f4

real
r0
r1
r2
r3
r4
r5
r6
r7
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
y(8)

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=

x(1)
x(1)
x(2)
x(2)
x(3)
x(3)
x(4)
x(4)
r0 +
r1 +
r0 r1 r2 +
r3 r2 r3 +

+ x(5)
- x(5)
+ x(6)
- x(6)
+ x(7)
- x(7)
+ x(8)
- x(8)
r4
r5
r4
r5
r7
r6
r7
r6
SPL Compiler: Summary
SPL Program
SPL Formula

Symbol Definition Template Definition

Parsing
Abstract Syntax Tree

Symbol Table

Template Table

Intermediate Code Generation
I-Code

Intermediate Code Restructuring
I-Code

Optimization
I-Code

Target Code Generation
C, FORTRAN function

Built-in optimizations:
 single static assignment code
 no reuse of temporary vars
 only scalar temporary vars
 constants precomputed
 limited CSE
Extensible through templates
DSP Transform

Algorithm (Formula)

Implementation

Search
Why Search?

DCT, type IV, size 16

~31000 formulas

• maaaany different formulas
• large spread in runtimes, even for modest size
• not due to arithmetic cost
• best formula is platform-dependent
Search Methods available in SPIRAL






Exhaustive Search
Dynamic Programming (DP)
Random Search
Hill Climbing
STEER (similar to a genetic algorithm)
Possible
Sizes
Exhaust Very small

Formulas
Timed

Results

All

Best

DP

All

10s-100s

(very) good

Random

All

User decided

fair/good

Hill Climbing

All

100s-1000s

Good

STEER

All

100s-1000s

(very) good

Search over
• algorithm space and
• implementation options (degree of unrolling)
STEER
Population n:

Mutation

……

Cross-Breeding

expand
differently

Population n+1:
swap
expansions

……

Survival of Fittest
Experimental Results (C code)
search methods
(applicable to all transforms)

high performance code
(compared with FFTW)

different transforms

generated high quality code
SPIRAL System
 Available for download (v3.1): www.ece.cmu.edu/~spiral
 Easy installation (Unix: configure/make; Windows: install shield)
 Unix/Linux and Windows 98/ME/NT/2000/XP
 Current transforms: DFT, DHT, WHT, RHT, DCT/DST type I – IV,
MDCT, Filters, Wavelets, Toeplitz, Circulants
 Extensible
 New version (4.0) in preparation
Recent Work
Learning to Generate Fast Algorithms
• Learns from given dataset (formulas+runtimes) how to design a
fast algorithm (breakdown strategy)
• Learns from a transform of one size, generates the best algorithm
for many sizes
• Tested for DFT and WHT
SIMD Short Vector Extensions

vector length = 4
(4-way)
x

+

 Extension to instruction set architecture
 Available on most current architectures
(SSE on Pentium, AltiVec on Motorola G4)
 Originally for multimedia (like MMX for integers)
 Requires fine grain parallelism
 Large potential speed-up

Problems:
 SIMD instructions are architecture specific
 No common API (usually assembly hand coding)
 Performance very sensitive to memory access
 Automatic vectorization very limited

very difficult to use
Vector code generation from SPL formulas
Naturally vectorizable construct

A ⊗ I4

A

vector length

x

y

(Current) generic construct completely vectorizable:
k

∏ P D ( A ⊗ Iυ ) E Q
i

i

i

i =1

i

i

Pi, Qi
Di, Ei
Ai
ν

permutations
diagonals
arbitrary formulas
SIMD vector length

Vectorization in two steps:
1. Formula manipulation using manipulation rules
2. Code generation (vector code + C code)
Generated Vector Code DFTs: Pentium 4
7

6

Spiral SSE

gflops

5

Intel MKL interl.
4

FFTW 2.1.3
Spiral C and F95

3

Spiral F95 vect
SIMD-FFT

2

1

0
4

5

6

7

8

9

10

11

12

13

14

n
DFT 2^n, Pentium 4, 2.53 GHz, using Intel C compiler 6.0

 speedups (to C code) up to factor of 3.3
 beats hand-tuned vendor library
Generated Vector Code, Other Transforms
WHT
normalized runtime

normalized runtime

2-dim DCT

transform size

transform size

speedups up to factor of 2.5
Flexible Loop Interleaving (Runtime Gain WHT)

Athlon XP: up to 55%

Alpha 21264: up to 70%

Pentium 4: up to 45%

UltraSparc III: up to 60%
Parallel Code Generation: Example WHT
10

PowerPC RS64 III

Speedup

8
6
4

1 thread
8 threads
10 threads

2
0
1

6

11

16

WHT size log(N)

21

26
Parallelized constructs:
In ⊗ A, A ⊗ In
Code Scheduling
Runtime histograms

DFT, size 16 ~6500 formulas

DCT4, size 16 ~16000 formulas

unscheduled
scheduled
Filters and Wavelets
New constructs: row/column overlapped tensor product

A

A

A

I n ⊗k A =

A

I n ⊗k A =

A

A

A
Examples for rules:

Fn (h) → ( I n / d ⊗ k I d + k ) ⋅ ( I n / d ⊗ Fd (h))

DWTn (W ) → ( DWTn / 2 (W ) ⊕ I n / 2 ) ⋅ P ⋅ ( I n / 2 ⊗ k W ) ⋅ E

A
Conclusions


Automatic code generation for the entire domain of
(linear) DSP algorithms



Portable high performance across platforms and
across time



Integration of math (high) level and implementation
(low) level



Intelligence through search and learning in the
space of alternatives
Future Plans


Transforms: Radon, Gabor, Hankel, structured
matrices, …



Target platforms: parallel platforms, DSP
processors, SW/HW architectures, FPGAs, ASICs



Instructions: Vector, FMAs, prefetching, OpenMP,
MPI



Beyond transforms: entire DSP applications



Other domains amenable to SPIRAL approach?
Questions?
http://www.ece.cmu.edu/~spiral
Extra Slides
Generating Parallel Programs



Interpret constructs such as In ⊗ A as parallel operations and
transform formulas to obtain maximal parallelism.



Explore alternative data access patterns mathematically (e.g.
different permutations in matrix factorizations)



Prototype implementation using WHT

 Build on existing sequential package
 SMP implementation using OpenMP (IPDPS’02)

 90% efficiency obtained on 12 processor PowerPC RS64 III
 Distributed memory implementation using MPI (POHLL’02)
Comparison of Parallel DDL Schemes

10

Speedup

PowerPC RS64 III
10 threads
8
6
4
2
0
1

6

11

16

21

WHT size log(N)

26

Best sequential with
DDL
Parallel without DDL
Coarse-grained DDL
Fine-grained DDL
with ID shift
Overall Parallel Speedup

10

PowerPC RS64 III

6

Speedup

8

1 thread
8 threads
10 threads

4
2
0
1

6

11

16

WHT size log(N)

21

26
Performance of Digit Permutations on CRAY T3E

Distribution of Global Tensor Permutations on 128 Processors (shmem put)
1200

Number of tensor permutations

1000

800

600

400

200

0

0

50

100

150
200
250
Bandwidth in MB/sec/node

300

350
Architecture Framework

P4 ( I 8 ⊗ F2 )T4 P4− 1 P3 ( I 8 ⊗ F2 )T3 P3− 1 P2 ( I 8 ⊗ F2 )T2 P2− 1 P1 ( I 8 ⊗ F2 )T1P1− 1 Pd

Host

I/O Interface

Parameters: Dimension,
Pd
I/O Interface

M = Memory
AG = Address Generator
CU = Computation Unit

Interconnection Network

Parameters: Pi, 1 ≤ i ≤ n,
no. of processor

AG
CU AG

CU AG

M

M

CU AG

... M

CU AG

M

Parameters: Dimension,
and Ti, 1 ≤ i ≤ n, no. of
processor 2m

CU
Pease Algorithm Dataflow

L16 ( I 8 ⊗ F2 )T3 L16 L16 ( I 8 ⊗ F2 )T2 L16 L16 ( I 8 ⊗ F2 )T1 L16 ( I 8 ⊗ F2 ) Pd
2
8
4
4
8
2
Stage 0

Stage 1

Stage 2

L16
2

Stage 3
0

0

1

2

2

4

3

6

4

8

5

10

6

12

7

14

8

1

9

3

10

5

11

7

12

9

13

11

14

13

15

15
Optimal Dataflow

L16 ( L82 ⊗ I 2 )( I 8 ⊗ F2 )T3 ( L84 ⊗ I 2 ) L16 L16 ( L82 ⊗ I 2 )( I 8 ⊗ F2 )T2 ( L84 ⊗ I 2 ) L16 L16 ( I 8 ⊗ F2 )T1 L16 ( L84 ⊗ I 2 )( I 8 ⊗ F2 )( L82 ⊗ I 2 ) Pd
2
8
4
4
8
2

Stage 0

Stage 1

Stage 2

Stage 3
Pease v.s. Optimal

Performance comparison of Pease algorithm and
optimal algorithm

no. of clock cycles

12000
10000
8000
Pease

6000

Optimal

4000
2000
0
5

6

7

8

FFT size

Stage
1
2
3
4
5
6

Local
Access
32
32
64
128
64
32

Pease
Remote
Local
Access Peformance Access
96
3900
128
96
3860
128
64
3900
128
0
640
128
64
3840
64
96
3880
64

Optimal
Remote
Access Performance
0
640
0
640
0
640
0
640
64
2560
64
2560
Performance: Speedup

Speedup

Speedup(4*N*log(N)/No. of
Clocks)

4
3.5
3
2.5
2
1.5
1
0.5
0
5

6

7

8

9

10

11

12

13

14

15

16

Address Size (Bits)

Speedup = S/C
C = no. of. clock cycles using 4-processor FFT engine
S = minimum no. of clock cycles using single processor
(4*2n*n,
2n is the number of points)
SPIRAL Approach
given

DSP Transform

given

Possible
Algorithms
Possible
Implementations
Performance
Evaluation

Intelligent Search

SPIRAL Search Space

(DFT, DCT, Wavelets etc.)

Computing Platform
(Pentium III, Pentium 4, Athlon,
SUN, PowerPC, Alpha, … )

adapted
implementation
Classical Code Generation System
given

DSP Transform
(DFT, DCT, Wavelets etc.)

Math/Algorithm
Expert
Expert
Programmer
Performance
Evaluation

given

Computing Platform
(Pentium III, Pentium 4, Athlon,
SUN, PowerPC, Alpha, … )

adapted
implementation
Algorithms = Ruletrees = Formulas
DCT8( II )

DCTn( II ) → P ⋅ ( DCTn(/II ) ⊕ DCTn(/IV ) ) ⋅ ( F2 ⊗ I n / 2 )
2
2

R1

DCT4( IV )

DCT4( II )
R1

DCT2( II )

DCT4( II )

DCT2( IV )

R3

F2

DCTn( IV ) → P ⋅ DCTn( II ) ⋅ S

R6

R6
( II )
2

R1

DST
R4

DCT2( II )

F2

R3

DCT2( II ) →

1
⋅ F2
2

DCT2( IV )

F2

R6

DST2( II )
R4

F2
Mathematical Framework: Summary
 fast algorithms represented as ruletrees (easy generation/manipulation)
and as formulas (can be translated into code)
 formulas built from few constructs and primitives
 many different algorithms/formulas generated from few rules
(combinatorial explosion)
 these algorithms are (essentially) equal in arithmetic cost,
but differ in data flow
Formula Generation

data base (extensible!)
data type

Formula Generator

rules

recursive
application

transforms

control search engine

ruletrees

formulas

runtime

export

formula
translation
(spl compiler)

translation

 written in GAP/AREP (computer algebra system)
 all computation/manipulation is symbolic
 exact arithmetic
 easy extensible rule and transform data base
 verification of rules and formulas

cut here for other
optimization problems
Number of Formulas/Algorithms
k

# DFT, size 2^k

# DCT-IV, size 2^k

1
2
3
4
5
6
7
8
9

1
6
40
296
27744
162570361280
~1.01 • 10^27
~2.31 • 10^61
~2.86 • 10^133

1
10
126
31242
1924443362
7343815121631354242
~1.07 • 10^38
~2.30 • 10^76
~1.06 • 10^153

 differ in data flow not in arithmetic cost
 exponential search space
Extensibility of SPIRAL
New transforms are readily included on the high level
(easy, due to SPIRAL’s framework)
New constructs and primitives (potentially required by radically
different transforms) are readily included in SPL
(moderate effort, due to template mechanism)
New instructions sets available (e.g., SSE) are included by
extending the SPL compiler
(doable one time effort)
Generated Vector Code DFTs: Pentium III
1.8

1.6

1.4

gflops

1.2

Intel MKL
Spiral SSE
Spiral C/F95 vect
Spiral C/F95
FFTW 2.1.3

1

0.8

0.6

0.4

0.2

0
4

5

6

7

8

9

10

11

12

13

n
DFT 2^n, Pentium III, 1 GHz, using Intel C compiler 6.0

speedups (to C code) up to factor of 2.5

More Related Content

What's hot

Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFTtaha25
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2Ali Baig
 
Ff tand matlab-wanjun huang
Ff tand matlab-wanjun huangFf tand matlab-wanjun huang
Ff tand matlab-wanjun huangSagar Ahir
 
Digital Signal Processing Tutorial:Chapt 3 frequency analysis
Digital Signal Processing Tutorial:Chapt 3 frequency analysisDigital Signal Processing Tutorial:Chapt 3 frequency analysis
Digital Signal Processing Tutorial:Chapt 3 frequency analysisChandrashekhar Padole
 
Fft presentation
Fft presentationFft presentation
Fft presentationilker Şin
 
fft using labview
fft using labviewfft using labview
fft using labviewkiranrockz
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier TransformAbhishek Choksi
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformationsJohn Williams
 
The discrete fourier transform (dsp) 4
The discrete fourier transform  (dsp) 4The discrete fourier transform  (dsp) 4
The discrete fourier transform (dsp) 4HIMANSHU DIWAKAR
 
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformAmr E. Mohamed
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsArvind Devaraj
 
Chapter5 - The Discrete-Time Fourier Transform
Chapter5 - The Discrete-Time Fourier TransformChapter5 - The Discrete-Time Fourier Transform
Chapter5 - The Discrete-Time Fourier TransformAttaporn Ninsuwan
 
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformAmr E. Mohamed
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT ProcessorRohit Singh
 

What's hot (20)

Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
 
Ff tand matlab-wanjun huang
Ff tand matlab-wanjun huangFf tand matlab-wanjun huang
Ff tand matlab-wanjun huang
 
Digital Signal Processing Tutorial:Chapt 3 frequency analysis
Digital Signal Processing Tutorial:Chapt 3 frequency analysisDigital Signal Processing Tutorial:Chapt 3 frequency analysis
Digital Signal Processing Tutorial:Chapt 3 frequency analysis
 
Fft presentation
Fft presentationFft presentation
Fft presentation
 
fft using labview
fft using labviewfft using labview
fft using labview
 
Discrete Fourier Transform
Discrete Fourier TransformDiscrete Fourier Transform
Discrete Fourier Transform
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformations
 
Fft ppt
Fft pptFft ppt
Fft ppt
 
The discrete fourier transform (dsp) 4
The discrete fourier transform  (dsp) 4The discrete fourier transform  (dsp) 4
The discrete fourier transform (dsp) 4
 
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier Transform
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
Chapter5 - The Discrete-Time Fourier Transform
Chapter5 - The Discrete-Time Fourier TransformChapter5 - The Discrete-Time Fourier Transform
Chapter5 - The Discrete-Time Fourier Transform
 
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_FOEHU - Lec 08 - The Discrete Fourier Transform
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Properties of dft
Properties of dftProperties of dft
Properties of dft
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT Processor
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 

Viewers also liked

Iact social media mktg 102715
Iact social media mktg 102715Iact social media mktg 102715
Iact social media mktg 102715douglaslyon
 
Jonathan winteroct2015
Jonathan winteroct2015Jonathan winteroct2015
Jonathan winteroct2015douglaslyon
 
Ip clinic power point
Ip clinic power pointIp clinic power point
Ip clinic power pointdouglaslyon
 
2010 Leveraging technology for content delivery
2010 Leveraging technology for content delivery2010 Leveraging technology for content delivery
2010 Leveraging technology for content deliveryWCET
 
2011Challenges and Successess in Faculty Development
2011Challenges and Successess in Faculty Development2011Challenges and Successess in Faculty Development
2011Challenges and Successess in Faculty DevelopmentWCET
 
2010 Exploring online faculty workload
2010 Exploring online faculty workload2010 Exploring online faculty workload
2010 Exploring online faculty workloadWCET
 
2011Simpson Smackdown 2.0
2011Simpson Smackdown 2.02011Simpson Smackdown 2.0
2011Simpson Smackdown 2.0WCET
 
2010 Are They Students? Or Customers? Does it Matter?
2010 Are They Students? Or Customers? Does it Matter?2010 Are They Students? Or Customers? Does it Matter?
2010 Are They Students? Or Customers? Does it Matter?WCET
 
2010 Does Social Networking Equate to Social Learning?
2010 Does Social Networking Equate to Social Learning?2010 Does Social Networking Equate to Social Learning?
2010 Does Social Networking Equate to Social Learning?WCET
 
2011Opening New Frontiers
2011Opening New Frontiers2011Opening New Frontiers
2011Opening New FrontiersWCET
 
Videoconferencing creating the_virtual
Videoconferencing creating the_virtualVideoconferencing creating the_virtual
Videoconferencing creating the_virtualWCET
 
2010 Online Class Student Support Resources
2010 Online Class Student Support Resources2010 Online Class Student Support Resources
2010 Online Class Student Support ResourcesWCET
 
2011Building classroommemories622
2011Building classroommemories6222011Building classroommemories622
2011Building classroommemories622WCET
 
2010 Debose columbus pk
2010  Debose columbus pk2010  Debose columbus pk
2010 Debose columbus pkWCET
 
2010 Boyd pk
2010  Boyd pk2010  Boyd pk
2010 Boyd pkWCET
 
2011Approaches to Assessing Students
2011Approaches to Assessing Students2011Approaches to Assessing Students
2011Approaches to Assessing StudentsWCET
 
2011Coles Smackdown 2.0
2011Coles Smackdown 2.02011Coles Smackdown 2.0
2011Coles Smackdown 2.0WCET
 

Viewers also liked (20)

Iact social media mktg 102715
Iact social media mktg 102715Iact social media mktg 102715
Iact social media mktg 102715
 
Jonathan winteroct2015
Jonathan winteroct2015Jonathan winteroct2015
Jonathan winteroct2015
 
03 ddl
03 ddl03 ddl
03 ddl
 
Chapter14
Chapter14Chapter14
Chapter14
 
Ip clinic power point
Ip clinic power pointIp clinic power point
Ip clinic power point
 
Ch9
Ch9Ch9
Ch9
 
2010 Leveraging technology for content delivery
2010 Leveraging technology for content delivery2010 Leveraging technology for content delivery
2010 Leveraging technology for content delivery
 
2011Challenges and Successess in Faculty Development
2011Challenges and Successess in Faculty Development2011Challenges and Successess in Faculty Development
2011Challenges and Successess in Faculty Development
 
2010 Exploring online faculty workload
2010 Exploring online faculty workload2010 Exploring online faculty workload
2010 Exploring online faculty workload
 
2011Simpson Smackdown 2.0
2011Simpson Smackdown 2.02011Simpson Smackdown 2.0
2011Simpson Smackdown 2.0
 
2010 Are They Students? Or Customers? Does it Matter?
2010 Are They Students? Or Customers? Does it Matter?2010 Are They Students? Or Customers? Does it Matter?
2010 Are They Students? Or Customers? Does it Matter?
 
2010 Does Social Networking Equate to Social Learning?
2010 Does Social Networking Equate to Social Learning?2010 Does Social Networking Equate to Social Learning?
2010 Does Social Networking Equate to Social Learning?
 
2011Opening New Frontiers
2011Opening New Frontiers2011Opening New Frontiers
2011Opening New Frontiers
 
Videoconferencing creating the_virtual
Videoconferencing creating the_virtualVideoconferencing creating the_virtual
Videoconferencing creating the_virtual
 
2010 Online Class Student Support Resources
2010 Online Class Student Support Resources2010 Online Class Student Support Resources
2010 Online Class Student Support Resources
 
2011Building classroommemories622
2011Building classroommemories6222011Building classroommemories622
2011Building classroommemories622
 
2010 Debose columbus pk
2010  Debose columbus pk2010  Debose columbus pk
2010 Debose columbus pk
 
2010 Boyd pk
2010  Boyd pk2010  Boyd pk
2010 Boyd pk
 
2011Approaches to Assessing Students
2011Approaches to Assessing Students2011Approaches to Assessing Students
2011Approaches to Assessing Students
 
2011Coles Smackdown 2.0
2011Coles Smackdown 2.02011Coles Smackdown 2.0
2011Coles Smackdown 2.0
 

Similar to Knoxville aug02-full

Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfShreeDevi42
 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfHariPrasad314745
 
DIT-Radix-2-FFT in SPED
DIT-Radix-2-FFT in SPEDDIT-Radix-2-FFT in SPED
DIT-Radix-2-FFT in SPEDAjay Kumar
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Dsp model exam qp
Dsp model exam qpDsp model exam qp
Dsp model exam qpRama Dhurai
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxHamzaJaved306957
 
FourierTransform detailed power point presentation
FourierTransform detailed power point presentationFourierTransform detailed power point presentation
FourierTransform detailed power point presentationssuseracb8ba
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transformSimranjit Singh
 
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image TransformDIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image Transformvijayanand Kandaswamy
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...TELKOMNIKA JOURNAL
 
A combined sdc
A combined sdcA combined sdc
A combined sdcsakthi1986
 
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...sakthi1986
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
 

Similar to Knoxville aug02-full (20)

Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdf
 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdf
 
Fourier slide
Fourier slideFourier slide
Fourier slide
 
DIT-Radix-2-FFT in SPED
DIT-Radix-2-FFT in SPEDDIT-Radix-2-FFT in SPED
DIT-Radix-2-FFT in SPED
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Dsp model exam qp
Dsp model exam qpDsp model exam qp
Dsp model exam qp
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
FourierTransform detailed power point presentation
FourierTransform detailed power point presentationFourierTransform detailed power point presentation
FourierTransform detailed power point presentation
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 
Dft2
Dft2Dft2
Dft2
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transform
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 
Unit-1.pptx
Unit-1.pptxUnit-1.pptx
Unit-1.pptx
 
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image TransformDIGITAL IMAGE PROCESSING - Day 4 Image Transform
DIGITAL IMAGE PROCESSING - Day 4 Image Transform
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
 
A combined sdc
A combined sdcA combined sdc
A combined sdc
 
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...
s.Magesh kumar DECE,BTECH,ME (ASAN MEMORIAL COLLEGE OF ENGINEERING AND TECHNO...
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
 
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Knoxville aug02-full

  • 1. SPIRAL: Current Status Markus Püschel Students Faculty • José Moura (CMU) • Jeremy Johnson (Drexel) • Robert Johnson (MathStar) • David Padua (UIUC) • Viktor Prasanna (USC) • Markus Püschel (CMU) • Manuela Veloso (CMU) • Gavin Haentjens (CMU) • Pinit Kumhom (Drexel) • Neungsoo Park (USC) • David Sepiashvili (CMU) • Bryan Singer (CMU) • Yevgen Voronenko (Drexel) • Edward Wertz (CMU) • Jianxin Xiong (UIUC) Collaborators • Christoph Überhuber (TU Vienna) • Franz Franchetti (TU Vienna) http://www.ece.cmu.edu/~spiral
  • 2. Sponsor Work supported by DARPA (DSO), Applied & Computational Mathematics Program, OPAL, through grant managed by research grant DABT63-98-1-0004 administered by the Army Directorate of Contracting.
  • 3. Moore’s Law and High(est) Performance Scientific Computing (single processor, off-the-shelf) Moore’s Law:  processor-memory bottleneck  short life cycles of computers  very complex architectures • vendor specific • special instructions (MMX, SSE, FMA, …) • undocumented features Consequences for software/algorithms:  arithmetic cost model not accurate for predicting runtime  better performance models hard to get  best code is machine dependent (registers/caches size, structure)  hand-tuned code becomes obsolete as fast as it is written  compiler limitations  full performance requires (in part) assembly coding Portable performance requires automation
  • 4. SPIRAL Automates Implementation  cuts development costs  code less error-prone Optimization  systematic exploration of alternatives both at algorithmic and code level Platform-Adaptation  takes advantage of architecture specific features  porting without loss of performance of DSP algorithms  are performance critical A library generator for highly optimized signal processing algorithms
  • 5. SPIRAL system goes for a coffee ian tic a controls algorithm generation Formula Generator fast algorithm as SPL formula ert Exp m SPL Compiler er ram rog P controls implementation options C/Fortran/SIMD code platform-adapted implementation Search Engine hem at M specifies runtime on given platform comes back (or an espresso for small transforms) SPIRAL DSP transform user
  • 7. DSP Algorithms: Example 4-point DFT Cooley/Tukey FFT (size 4): 1 1 1 1  1 1 i − 1 − i  0  = 1 − 1 1 − 1 1    1 − i − 1 i  0  Fourier transform 0 1 0  1 1 0 1  0  0 − 1 0  0  1 0 − 1 0 0 1 0 0 0 0 1 0 0 1 1 0 1 − 1  0  0 0  i  0 0 0 0  1 0 0  0  1 1  0  1 − 1 0 0 0 1 0 0 1 0 0 Diagonal matrix (twiddles) DFT4 = ( DFT2 ⊗ I 2 ) ⋅ T24 ⋅ ( I 2 ⊗ DFT2 ) ⋅ L4 2 Kronecker product Identity Permutation  product of structured sparse matrices  mathematical notation 0 0  0  1
  • 8. DSP Algorithms: Terminology Transform DFTn Rule DFTnm → ( DFTn ⊗ I m ) ⋅ D ⋅ ( I n ⊗ DFTm ) ⋅ P parameterized matrix • a breakdown strategy • product of sparse matrices DFT 8 Ruletree DFT 2 DFT 4 DFT 2 Formula DFT 2 • recursive application of rules • uniquely defines an algorithm • efficient representation • easy manipulation DFT8 = ( F2 ⊗ I 4 ) ⋅ D ⋅ ( I 2 ⊗ ( I 2 ⊗ F2 ) ) ⋅ P • few constructs and primitives • uniquely defines an algorithm • can be translated into code
  • 9. DSP Transforms discrete Fourier transform Walsh-Hadamard transform DFTn = [ exp( 2kliπ / n ) ] WHT2k = DFT2 ⊗ ⊗ DFT2 DCT ( II ) n = [ cos( k ( l + 1 / 2 )π / n ) ] DCT ( IV ) n = [ cos( ( k + 1 / 2) (l + 1 / 2)π / n ) ] discrete cosine and sine Transforms (16 types) modified discrete cosine transform DST ( I ) n = [ sin ( klπ / n ) ] MDCTn×2 n = [ cos( ( k + ( n + 1) / 2)(l +1 / 2)π / n ) ] two-dimensional transform T ⊗T Others: filters, discrete wavelet transforms, Haar, Hartley, …
  • 10. Rules = Breakdown Strategies ( ) DCT2( II ) → diag 1,1 / 2 ⋅ F2 ( ) DCTn( II ) → P ⋅ DCTn(/II ) ⊕ DCTn(/IV ) ⋅ ( I n / 2 ⊗ F2 ) 2 2 base case Q recursive DCTn( IV ) → S ⋅ DCTn( II ) ⋅ D translation DCTn( IV ) → M 1  M r DFTn → B ⋅ ( DCTn(/I2) ⊕ DSTn(/I2) ) ⋅ C iterative recursive DFTnm → ( DFTn ⊗ I m ) ⋅ D ⋅ ( I n ⊗ DFTm ) ⋅ P recursive Fn (h) → ( I n / d ⊗ k I d + k ) ⋅ ( I n / d ⊗ Fd (h)) recursive Fn (h) → Circ (h ) ⋅ E recursive DWTn (W ) → ( DWTn / 2 (W ) ⊕ I n / 2 ) ⋅ P ⋅ ( I n / 2 ⊗ k W ) ⋅ E recursive WHT2n → ∏ ( I 2n1+K+ni −1 ⊗ WHT2ni ⊗ I 2ni +1+K+nt ) iterative/ recursive MDCTn×2 n → S ⋅ DCTn( IV ) ⋅ P translation n i =1
  • 11. Formula for a DCT, size 16
  • 13. Formulas in SPL •••• ( compose ( diagonal ( 2*cos(1/16*pi) 2*cos(3/16*pi) 2*cos(5/16*pi) 2*cos(7/16*pi) ) ) ( permutation ( 1 3 4 2 ) ) ( tensor ( I 2 ) ( F 2 ) ) ( permutation ( 1 4 2 3 ) ) ( direct_sum ( compose ( F 2 ) ( diagonal ( 1 sqrt(1/2) ) ) ) ( compose ( matrix ( 1 1 0 ) ( 0 (-1) 1 ) ) ( diagonal ( cos(13/8*pi)-sin(13/8*pi) sin(13/8*pi) cos(13/8*pi)+sin(13/8*pi) ) ) ( matrix ( 1 0 ) ( 1 1 ) ( 0 1 ) ) ( permutation ( 2 1 ) ) ••••
  • 14. SPL Syntax (Subset)  matrix operations: (compose formula formula ...) (tensor formula formula ...) (direct_sum formula formula ...)  direct matrix description: (matrix (a11 a12 ...) (a21 a22 ...) ...) (diagonal (d1 d2 ...)) (permutation (p1 p2 ...))  parameterized matrices: (I n) (F n)  scalars: 1.5, 2/7, cos(..), w(3), pi, 1.2e-04 allows extension of SPL  definition of new symbols: (define name formula) (template formula (i-code-list)  directives for code generation #codetype real/complex controls loop unrolling #unroll on/off
  • 15. SPL Compiler, 4-point FFT (compose (tensor (F 2) (I 2)) (T 4 2) (tensor (I 2) (F 2)) (L 4 2)) fast algorithm as formula as SPL program #codetype complex f0 f1 f2 f3 f4 y(1) y(2) y(3) y(4) = = = = = = = = = x(1) + x(3) x(1) - x(3) x(2) + x(4) x(2) - x(4) (0.00d0,-1.00d0)*f(3) f0 + f2 f0 - f2 f1 + f4 f1 - f4 real r0 r1 r2 r3 r4 r5 r6 r7 y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) = = = = = = = = = = = = = = = = x(1) x(1) x(2) x(2) x(3) x(3) x(4) x(4) r0 + r1 + r0 r1 r2 + r3 r2 r3 + + x(5) - x(5) + x(6) - x(6) + x(7) - x(7) + x(8) - x(8) r4 r5 r4 r5 r7 r6 r7 r6
  • 16. SPL Compiler: Summary SPL Program SPL Formula Symbol Definition Template Definition Parsing Abstract Syntax Tree Symbol Table Template Table Intermediate Code Generation I-Code Intermediate Code Restructuring I-Code Optimization I-Code Target Code Generation C, FORTRAN function Built-in optimizations:  single static assignment code  no reuse of temporary vars  only scalar temporary vars  constants precomputed  limited CSE Extensible through templates
  • 18. Why Search? DCT, type IV, size 16 ~31000 formulas • maaaany different formulas • large spread in runtimes, even for modest size • not due to arithmetic cost • best formula is platform-dependent
  • 19. Search Methods available in SPIRAL      Exhaustive Search Dynamic Programming (DP) Random Search Hill Climbing STEER (similar to a genetic algorithm) Possible Sizes Exhaust Very small Formulas Timed Results All Best DP All 10s-100s (very) good Random All User decided fair/good Hill Climbing All 100s-1000s Good STEER All 100s-1000s (very) good Search over • algorithm space and • implementation options (degree of unrolling)
  • 21. Experimental Results (C code) search methods (applicable to all transforms) high performance code (compared with FFTW) different transforms generated high quality code
  • 22. SPIRAL System  Available for download (v3.1): www.ece.cmu.edu/~spiral  Easy installation (Unix: configure/make; Windows: install shield)  Unix/Linux and Windows 98/ME/NT/2000/XP  Current transforms: DFT, DHT, WHT, RHT, DCT/DST type I – IV, MDCT, Filters, Wavelets, Toeplitz, Circulants  Extensible  New version (4.0) in preparation
  • 24. Learning to Generate Fast Algorithms • Learns from given dataset (formulas+runtimes) how to design a fast algorithm (breakdown strategy) • Learns from a transform of one size, generates the best algorithm for many sizes • Tested for DFT and WHT
  • 25. SIMD Short Vector Extensions vector length = 4 (4-way) x +  Extension to instruction set architecture  Available on most current architectures (SSE on Pentium, AltiVec on Motorola G4)  Originally for multimedia (like MMX for integers)  Requires fine grain parallelism  Large potential speed-up Problems:  SIMD instructions are architecture specific  No common API (usually assembly hand coding)  Performance very sensitive to memory access  Automatic vectorization very limited very difficult to use
  • 26. Vector code generation from SPL formulas Naturally vectorizable construct A ⊗ I4 A vector length x y (Current) generic construct completely vectorizable: k ∏ P D ( A ⊗ Iυ ) E Q i i i i =1 i i Pi, Qi Di, Ei Ai ν permutations diagonals arbitrary formulas SIMD vector length Vectorization in two steps: 1. Formula manipulation using manipulation rules 2. Code generation (vector code + C code)
  • 27. Generated Vector Code DFTs: Pentium 4 7 6 Spiral SSE gflops 5 Intel MKL interl. 4 FFTW 2.1.3 Spiral C and F95 3 Spiral F95 vect SIMD-FFT 2 1 0 4 5 6 7 8 9 10 11 12 13 14 n DFT 2^n, Pentium 4, 2.53 GHz, using Intel C compiler 6.0  speedups (to C code) up to factor of 3.3  beats hand-tuned vendor library
  • 28. Generated Vector Code, Other Transforms WHT normalized runtime normalized runtime 2-dim DCT transform size transform size speedups up to factor of 2.5
  • 29. Flexible Loop Interleaving (Runtime Gain WHT) Athlon XP: up to 55% Alpha 21264: up to 70% Pentium 4: up to 45% UltraSparc III: up to 60%
  • 30. Parallel Code Generation: Example WHT 10 PowerPC RS64 III Speedup 8 6 4 1 thread 8 threads 10 threads 2 0 1 6 11 16 WHT size log(N) 21 26 Parallelized constructs: In ⊗ A, A ⊗ In
  • 31. Code Scheduling Runtime histograms DFT, size 16 ~6500 formulas DCT4, size 16 ~16000 formulas unscheduled scheduled
  • 32. Filters and Wavelets New constructs: row/column overlapped tensor product A A A I n ⊗k A = A I n ⊗k A = A A A Examples for rules: Fn (h) → ( I n / d ⊗ k I d + k ) ⋅ ( I n / d ⊗ Fd (h)) DWTn (W ) → ( DWTn / 2 (W ) ⊕ I n / 2 ) ⋅ P ⋅ ( I n / 2 ⊗ k W ) ⋅ E A
  • 33. Conclusions  Automatic code generation for the entire domain of (linear) DSP algorithms  Portable high performance across platforms and across time  Integration of math (high) level and implementation (low) level  Intelligence through search and learning in the space of alternatives
  • 34. Future Plans  Transforms: Radon, Gabor, Hankel, structured matrices, …  Target platforms: parallel platforms, DSP processors, SW/HW architectures, FPGAs, ASICs  Instructions: Vector, FMAs, prefetching, OpenMP, MPI  Beyond transforms: entire DSP applications  Other domains amenable to SPIRAL approach?
  • 37. Generating Parallel Programs  Interpret constructs such as In ⊗ A as parallel operations and transform formulas to obtain maximal parallelism.  Explore alternative data access patterns mathematically (e.g. different permutations in matrix factorizations)  Prototype implementation using WHT  Build on existing sequential package  SMP implementation using OpenMP (IPDPS’02)  90% efficiency obtained on 12 processor PowerPC RS64 III  Distributed memory implementation using MPI (POHLL’02)
  • 38. Comparison of Parallel DDL Schemes 10 Speedup PowerPC RS64 III 10 threads 8 6 4 2 0 1 6 11 16 21 WHT size log(N) 26 Best sequential with DDL Parallel without DDL Coarse-grained DDL Fine-grained DDL with ID shift
  • 39. Overall Parallel Speedup 10 PowerPC RS64 III 6 Speedup 8 1 thread 8 threads 10 threads 4 2 0 1 6 11 16 WHT size log(N) 21 26
  • 40. Performance of Digit Permutations on CRAY T3E Distribution of Global Tensor Permutations on 128 Processors (shmem put) 1200 Number of tensor permutations 1000 800 600 400 200 0 0 50 100 150 200 250 Bandwidth in MB/sec/node 300 350
  • 41. Architecture Framework P4 ( I 8 ⊗ F2 )T4 P4− 1 P3 ( I 8 ⊗ F2 )T3 P3− 1 P2 ( I 8 ⊗ F2 )T2 P2− 1 P1 ( I 8 ⊗ F2 )T1P1− 1 Pd Host I/O Interface Parameters: Dimension, Pd I/O Interface M = Memory AG = Address Generator CU = Computation Unit Interconnection Network Parameters: Pi, 1 ≤ i ≤ n, no. of processor AG CU AG CU AG M M CU AG ... M CU AG M Parameters: Dimension, and Ti, 1 ≤ i ≤ n, no. of processor 2m CU
  • 42. Pease Algorithm Dataflow L16 ( I 8 ⊗ F2 )T3 L16 L16 ( I 8 ⊗ F2 )T2 L16 L16 ( I 8 ⊗ F2 )T1 L16 ( I 8 ⊗ F2 ) Pd 2 8 4 4 8 2 Stage 0 Stage 1 Stage 2 L16 2 Stage 3 0 0 1 2 2 4 3 6 4 8 5 10 6 12 7 14 8 1 9 3 10 5 11 7 12 9 13 11 14 13 15 15
  • 43. Optimal Dataflow L16 ( L82 ⊗ I 2 )( I 8 ⊗ F2 )T3 ( L84 ⊗ I 2 ) L16 L16 ( L82 ⊗ I 2 )( I 8 ⊗ F2 )T2 ( L84 ⊗ I 2 ) L16 L16 ( I 8 ⊗ F2 )T1 L16 ( L84 ⊗ I 2 )( I 8 ⊗ F2 )( L82 ⊗ I 2 ) Pd 2 8 4 4 8 2 Stage 0 Stage 1 Stage 2 Stage 3
  • 44. Pease v.s. Optimal Performance comparison of Pease algorithm and optimal algorithm no. of clock cycles 12000 10000 8000 Pease 6000 Optimal 4000 2000 0 5 6 7 8 FFT size Stage 1 2 3 4 5 6 Local Access 32 32 64 128 64 32 Pease Remote Local Access Peformance Access 96 3900 128 96 3860 128 64 3900 128 0 640 128 64 3840 64 96 3880 64 Optimal Remote Access Performance 0 640 0 640 0 640 0 640 64 2560 64 2560
  • 45. Performance: Speedup Speedup Speedup(4*N*log(N)/No. of Clocks) 4 3.5 3 2.5 2 1.5 1 0.5 0 5 6 7 8 9 10 11 12 13 14 15 16 Address Size (Bits) Speedup = S/C C = no. of. clock cycles using 4-processor FFT engine S = minimum no. of clock cycles using single processor (4*2n*n, 2n is the number of points)
  • 46. SPIRAL Approach given DSP Transform given Possible Algorithms Possible Implementations Performance Evaluation Intelligent Search SPIRAL Search Space (DFT, DCT, Wavelets etc.) Computing Platform (Pentium III, Pentium 4, Athlon, SUN, PowerPC, Alpha, … ) adapted implementation
  • 47. Classical Code Generation System given DSP Transform (DFT, DCT, Wavelets etc.) Math/Algorithm Expert Expert Programmer Performance Evaluation given Computing Platform (Pentium III, Pentium 4, Athlon, SUN, PowerPC, Alpha, … ) adapted implementation
  • 48. Algorithms = Ruletrees = Formulas DCT8( II ) DCTn( II ) → P ⋅ ( DCTn(/II ) ⊕ DCTn(/IV ) ) ⋅ ( F2 ⊗ I n / 2 ) 2 2 R1 DCT4( IV ) DCT4( II ) R1 DCT2( II ) DCT4( II ) DCT2( IV ) R3 F2 DCTn( IV ) → P ⋅ DCTn( II ) ⋅ S R6 R6 ( II ) 2 R1 DST R4 DCT2( II ) F2 R3 DCT2( II ) → 1 ⋅ F2 2 DCT2( IV ) F2 R6 DST2( II ) R4 F2
  • 49. Mathematical Framework: Summary  fast algorithms represented as ruletrees (easy generation/manipulation) and as formulas (can be translated into code)  formulas built from few constructs and primitives  many different algorithms/formulas generated from few rules (combinatorial explosion)  these algorithms are (essentially) equal in arithmetic cost, but differ in data flow
  • 50. Formula Generation data base (extensible!) data type Formula Generator rules recursive application transforms control search engine ruletrees formulas runtime export formula translation (spl compiler) translation  written in GAP/AREP (computer algebra system)  all computation/manipulation is symbolic  exact arithmetic  easy extensible rule and transform data base  verification of rules and formulas cut here for other optimization problems
  • 51. Number of Formulas/Algorithms k # DFT, size 2^k # DCT-IV, size 2^k 1 2 3 4 5 6 7 8 9 1 6 40 296 27744 162570361280 ~1.01 • 10^27 ~2.31 • 10^61 ~2.86 • 10^133 1 10 126 31242 1924443362 7343815121631354242 ~1.07 • 10^38 ~2.30 • 10^76 ~1.06 • 10^153  differ in data flow not in arithmetic cost  exponential search space
  • 52. Extensibility of SPIRAL New transforms are readily included on the high level (easy, due to SPIRAL’s framework) New constructs and primitives (potentially required by radically different transforms) are readily included in SPL (moderate effort, due to template mechanism) New instructions sets available (e.g., SSE) are included by extending the SPL compiler (doable one time effort)
  • 53. Generated Vector Code DFTs: Pentium III 1.8 1.6 1.4 gflops 1.2 Intel MKL Spiral SSE Spiral C/F95 vect Spiral C/F95 FFTW 2.1.3 1 0.8 0.6 0.4 0.2 0 4 5 6 7 8 9 10 11 12 13 n DFT 2^n, Pentium III, 1 GHz, using Intel C compiler 6.0 speedups (to C code) up to factor of 2.5

Editor's Notes

  1. This graph shows the parallel speedup of the WHT transform with 10 threads at different data sizes. The effect of granularity on parallel pseudo transpose is obvious. The fine-grained is better than the coarse-grained. It is different from the transform part. The yellow line shows the speedup of work-sharing OpenMP. There is only 3 times speedup even though 10 threads are used. This indicates, although OpenMP provides the convenience to parallelize iteration automatically, the efficiency could be very low depending on the nature of the problem.
  2. The dataflow of the conjugating form of the Pease Algorithm.