ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
Desain dan implementasi fftw3
1. Desain dan Implementasi FFTW3
Abstrak-FFTW merupakan implementasi dari Fourier diskrit
transform (DFT) yang menyesuaikan perangkat keras untuk
memaksimalkan kinerja. Makalah ini menunjukkan bahwa pendekatan
semacam itu
dapat menghasilkan sebuah implementasi yang kompetitif dengan
handoptimized
perpustakaan, dan menggambarkan struktur perangkat lunak yang
membuat versi FFTW3 kami saat ini fleksibel dan adaptif. kami
membahas lebih lanjut algoritma baru untuk data nyata DFTs utama,
suatu cara baru dalam menerapkan DFTs melalui petunjuk mesin-spesifik
"SIMD", dan bagaimana sebuah kompiler dengan tujuan khusus dapat
memperoleh implementasi yang dioptimalkan dari kosinus diskrit dan
transformasi sinus
secara otomatis dari algoritma DFT.
Indeks Syarat-FFT yaitu, software adaptif, Transformasi Fourier,
transformasi kosinus, transformasi Hartley, I / O tensor.
FFTW is a widely used free-software library that
computes the discrete Fourier transform (DFT) and its
various special cases. Its performance is competitive even with
vendor-optimized programs, but unlike these programs, FFTW
is not tuned to a fixed machine. Instead, FFTW uses a planner
to adapt its algorithms to the hardware in order to maximize
performance. The input to the planner is a problem, a multidimensional
loop of multi-dimensional DFTs. The problem is
a data structure that describes the “shape” of the input data—
array sizes and memory layouts—but does not contain the data
itself.
FFTW is fast, but its speed does not come at the expense of
flexibility. In performance-critical applications, many
transforms of the same size are typically required, and therefore
a large one-time cost is usually acceptable. When the problem becomes “sufficiently simple,”
FFTW produces
a plan that calls a fragment of optimized straight-line code
that solves the problem directly. These fragments are called
codelets in FFTW’s lingo. You can envision a codelet as
computing a “small” DFT, but many variations and special
cases exist. FFTW’s codelets are generated automatically by a specialpurpose
compiler called genfft.
Users with special needs can use genfft
to generate their own codelets. genfft is useful because
of the following features. From a high-level mathematical
2. description of a DFT algorithm, genfft derives an optimized
implementation automatically.
This paper
describes the main ideas common to all FFTW systems, the
runtime structure of FFTW3, and the modifications to genfft
since FFTW2.