Experiences with Numerics on OpenBSD - 1

Experiences with Numerics on OpenBSD - 1 - RNG

I was porting the CUBE cosmological simulation code and had decided to start at the beginning. The Github repo had some demo / test code so it seemed the best way forward was to walk, not “run”, ha ha.

RNG means Random Number Generators.

FFT means (fast) Fourier Transform. The FFTW3 library is one of the popular and well-tested FFT libraries around.

This experience was a trip through debugging and hypothesizing about fftw3, compiling, testing, retesting, code-reading and more.

The fft and OpenMP-FFT testers

The cube_fft subdirectory had three F90 test programs, for testing FFTs and OpenMP and combinations. Well, once you read the code there was no OpenMP in there. But the code was doing 3-D FFTs on cubic arrays.

So build (on OpenBSD 6.6, flang 8.0.1, fftw3 3.3.8, g95 8.3.0), using included build script, and discover g95 conflicts with flang over the location of mod files. Move the files, report the bug, and switch to flang (*).

echo $p
-I/usr/local/include -I/usr/local/include/flang -L/usr/local/lib
echo $libf
-lfftw3f

flang -O3 -o testfft.x $p $libf -x f95 test.f90.orig
./testfft.x
...(snip)...
 check results
     NaN             NaN             NaN             NaN
     NaN             NaN             NaN             NaN
     NaN             NaN

Whoah there. NaN from an FFT? Shouldn’t anything be allowed as input and therefore generate valid output?

(*) Note that flang is no longer available on OpenBSD from 6.8 onward.

GIGO thoughts

My first thought was gigo, that the input to the FFT was bad, hence the output was bad. But how to test?

subroutine checknan(d,ng)
    implicit none
    integer, intent(in) :: ng
    real, intent(in) :: d(:,:,:)

    integer i,j,k
    outer1: do k = 1, ng
    do j = 1, ng
    do i = 1, ng
        if(ieee_is_nan(den(i,j,k))) then
           type *, ' nan at', i, j, k
           exit outer1
        end if
    enddo
    enddo
    enddo outer1
endsubroutine checknan

And run this before and after the FFT routine. But no, only the output had NaN, the input was “clean”.

So, count the NaNs:

subroutine checknan(d,ng)
    implicit none
    integer, intent(in) :: ng
    real, intent(in) :: d(:,:,:)

    integer i,j,k, nancount
    nancount=0
    outer1: do k = 1, ng
    do j = 1, ng
    do i = 1, ng
        if(ieee_is_nan(den(i,j,k))) then
           nancount = nancount + 1
        end if
    enddo
    enddo
    enddo outer1
    if(nancount > 0) then
        print *,' nans found, ', nancount
    end if
endsubroutine checknan

And, as it turned out, there were lots:

  nans found,      14047232

Which was closely related to the array size (306 * 304 * 304), 14047232 being (152 * 304 * 304).

It makes sense. FFT works by mixing lots of the numbers throughout the array with lots of arithmetic: a few invalid numbers could spread throughout the array.

And, sometimes, only 14046624 elements were NaNs. Huh again!

Input is only semi-good?

Reading the various manuals, I rediscovered the IEEE_ builtins, and the IEEE_IS_NORMAL function which tests for a number that is not denormalized, infinity, or NaN.

But IS_NORMAL always counted 14047232 on the random numbers even when NaNs numbered 14046624.

Back to the code again.

Read the code

There were two processes going on in this test code. Generate some artificial data using random numbers and an angle and log density function. Then run FFT on the artificial data:

integer, parameter :: ng = 304
real den(ng + 2, ng, ng)
....
call random_number(den(:ng, :, :))  !!! fill with random [0,1)
....
    temp_theta = 2*pi*den(i, j, k)
    temp_r = sqrt(-2*log(1 - den(i + 1, j, k)))
    den(i, j, k) = temp_r*cos(temp_theta)
    den(i + 1, j, k) = temp_r*sin(temp_theta)
...
call sfftw_execute(plan_fft) ! forward transform
...

Which, in words, is a weighted cosine and sine of a random angle. The weight includes sqrt( negative * something ). Which is a hint. And the something is log(1 - something). Another hint.

Both sqrt and log are sensitive to values outside their valid domain: zero and positive for sqrt, and positive for log.

The argument to log() is supposed to be positive since den is filled with random numbers strictly less than one. According to the spec of random_number that is.

So, count more badnesses (and include hexadecimal representation of the floating point number):

    if(nancount /= notnormalcount) then
    outer3: do k = 1, ng
    do j = 1, ng
    do i = 1, ng
        if((.not. ieee_is_nan   (den(i,j,k))) .and. &
           (.not. ieee_is_normal(den(i,j,k))) ) then
           print '(A,3I5,F16.7,Z12)', ' unnormal ', i,j,k, &
             den(i,j,k), den(i,j,k)
           exit outer3
        end if
    enddo
    enddo
    enddo outer3

and find this output when checking the sqrt(log()) expression:

unnormal     7  165  111            -Inf    FF800000

Which confirms the sqrt(log(0)) = negative infinity. And the expression above, log(1-den) is in fact the source of the problem!

Now we can check the random number generator (RNG):

subroutine checkrngone(d,ng)
    implicit none
    integer, intent(in) :: ng
    real, intent(in) :: d(:,:,:)

    integer i,j,k
    outer1: do k = 1, ng
    do j = 1, ng
    do i = 1, ng
        if(den(i,j,k) >= 1.0) then
           print '(A,3I5,F16.7,Z12)', ' snbone ', i,j,k, &
             den(i,j,k), den(i,j,k)
           exit outer1
        end if
    enddo
    enddo
    enddo outer1

And run this test after filling the array with “random numbers less than one”. And get a hit:

snbone    81   79  104       1.0000000    3F800000

Aha! But why? Is there a bug in the runtime?

Read the code for the flang runtime (runtime/flang/random3f.c):

...
extern double drand48();

double ENT3F(RANDOM, random)(void) { return drand48(); }
...

Which is returning a double (real(8) for those of us who like it that way).

Which means, go back to read the code:

Truth emerges

Remember this?

real den(ng + 2, ng, ng)
....
call random_number(den(:ng, :, :))  !!! fill with random [0,1)

What we were seeing, but didn’t understand, was an implied

den = round_to_real4_number(real8_number)

Any real(8) value slightly less than 1.0D0 was rounded to the real(4) value 1.0E0. Which was emphatically not a bug, because it is the standard way for Fortran conversion from real(8) to real(4). It’s not a bug in the flang runtime anyway. But it is a bug in this test code. So, to fix:

use ieee_arithmetic, only: ieee_is_nan, ieee_is_normal, &
  ieee_set_rounding_mode, ieee_round_type, ieee_to_zero
...
call ieee_set_rounding_mode(ieee_to_zero)
...
call random_number(den(:ng, :, :))
call checkrngone(den, ng)
...

And after a few hundred million random numbers are generated and tested, it seems to work.

December 2019 (revised November 2020)

Glossary

FFT: Fast Fourier Transform, quickly obtaining the correlations of a signal with cosine and sine waves of progressive frequencies. Reversible. Useful.
gigo: Garbage-in produces Garbage-out.
IEEE: Institute of Electrical and Electronics Engineers. An international professional association which publishes standards and academic journals and organizes academic conferences. A shorthand for the IEEE-754 standard.
NaN: Not A Number. A representation of concepts like 1.0/0.0, and yet which is not infinity.
RNG: Random Number Generator. Not really, these are pseudo-random. And the utility of most RNGs is improved by making them reproducible, especially if you are debugging numerical code.

Links:

FFTW home

Flang compiler source

IEEE.org

Hao-ran Yu’s CUBE