Experiences with Numerics on OpenBSD - 3

Experiences with Numerics on OpenBSD - 3 - FFTW

The Fast Fourier Transform, FFT, and various incarnations including the popular FFTW3 library are well known in the signal-processing business. And, since just about everybody processes signals as a lifestyle, it’s a good thing to know it well.

The CUBEnu code uses FFT for convolution: converting to frequency domain, running a windowing function on the frequency-domain data, then inverting the fourier transform back to the spatial domain. The window function is just a sin().

Test the FFTW calling code

The code fragments call the planning and execution routines according to most of the recommendations in the documentation (FFTW version 3.3.7, October 2017). A couple of issues remain: using the recommended Fortran routines ddtw_execute_dft, _r2c, etc. And the routine names are slightly different for single-precision Fortran.

One possible option is the unaligned-data flag, not currently used, and consequent complicated alignment-compatible allocation mechanism. I didn’t try that, at least not the Fortran version. See section 7.4 of the FFTW doc for details.

Building FFTW with CPU-specific options

So, totally abandoning the OpenBSD packages for a build-your-own approach is not a good idea. But, if you absolutely positively need highest performance, this might be worth the trouble – which is mostly the issue of reproducing this build in a year or three from now when you no long remember how.

Here is my method:

cd
mkdir clang gcc src
cd src
ftp https://path.to.source.files.org/fftw-3.3.8.tar.gz
tar xzf fftw-3.3.8.tar.gz

cd fftw-3.3.8
./configure --help
./configure --quiet \
            --prefix=$HOME/clang \
            --disable-doc \
            --enable-threads \
            --enable-sse2 \
            --enable-float \
       CC=cc CFLAGS="-O3 -march=penryn" \
       F77=flang
gmake
gmake check
gmake install
gmake clean

And repeat without --enable-float.

For GCC, use CC=egcc and CFLAGS="-O3 -march=core2 -mtune=core2".

./configure --quiet \
            --prefix=$HOME/gcc \
            --disable-doc \
            --enable-threads \
            --enable-sse2 \
            --enable-float \
       CC=egcc CFLAGS="-O3 -march=core2 -mtune=core2 " \
       F77=egfortran
gmake
gmake check
gmake install
gmake clean

And another run without the --enable-float.

In spite of all this work, the result is very close to the OpenBSD package results. Custom compilation has no useful effect on Penryn (Core2).

Alignment

In OpenBSD, a malloc() for a large array usually results in a page-aligned allocation. Just out of curiosity I tried a few 4-byte offsets from such a page- aligned pointer, and got a small (consistent, but small at 7%) slowdown. Don’t do that.

The alignment option that FFTW3 document describes does work but is not faster, at least not on Penryn CPUs.

January 2020

Glossary

convolution: Calculating the response of a system to an input signal. Here, in CUBEnu, a system is a narrow band filter (a bump similar to a sin() wave in the frequency domain).
FFT: Fast Fourier Transform, quickly obtaining the correlations of a signal with cosine and sine waves of progressive frequencies. Reversible. Useful.

Links

FFTW Home Page Smith - The Scientist’s and Engineer’s Guide to Digital Signal Processing