After the graphics/imagery work in the last experience, I thought to review the definition of what ‘quality’ meant for pseudo-random number generators, again.
I added code for the xoroshiro128+ RNG, which uses various combinations of xor, shift, and or operators to generate random numbers quickly.
On the Raspbery Pi4 running Raspian:
$ ./xmcpi.x
xmcpi.f90 number of images: 1
approximating pi in 1 x 10100100100 points
nsum 7932562674.0000000 msum 10100100100.000000
pi 3.1415926535897931 iterated pi: 3.1415778439661208
pi error 1.4809623672284999E-005
Elapsed wall clock time 278. seconds, using 1 images.
pi@raspberrypi:~/rnums $ mpirun -np 2 -H localhost:4 ./xmcpi.x
xmcpi.f90 number of images: 2
approximating pi in 2 x 10100100100 points
nsum 7932590015.0000000 msum 10100100100.000000
nsum 7932562674.0000000 msum 10100100100.000000
pi 3.1415926535897931 iterated pi: 3.1415832579718690
pi error 9.3956179241239113E-006
Elapsed wall clock time 279. seconds, using 2 images.
$ mpirun -np 4 -H localhost:4 ./xmcpi.x
xmcpi.f90 number of images: 4
approximating pi in 4 x 10100100100 points
nsum 7932562674.0000000 msum 10100100100.000000
nsum 7932637592.0000000 msum 10100100100.000000
nsum 7932649267.0000000 msum 10100100100.000000
nsum 7932590015.0000000 msum 10100100100.000000
pi 3.1415926535897931 iterated pi: 3.1415965419986285
pi error -3.8884088353619006E-006
Elapsed wall clock time 403. seconds, using 4 images.
The Rpi4 is overheating, and slows down, resulting in 400+ seconds elapsed when running 4 cores at the same time. However, the results are exactly the same numerically as for the Intel AMD64-compatible CPUs.
The xoroshiro128+ code is platform independent, with this very simple empirical test.
So let’s try a different test: the statistical tests frequently mentioned by RNG practitioners (authors/users/critics). This is the TestU01 suite which includes a number of well-known RNGs and a scheme for testing anything we might come up with.
The code is, roughly
#include "unif01.h"
#include "bbattery.h"
#include "xoroshiro128plus_xd_xjr.h"
#include "xoroshiro128plus_xtc.h"
#include "pcg_od_ojr.h"
int main (void)
{
unif01_Gen *gen;
xoroshiro128plus_srandom(134893149852ull, 18324891485ull);
gen = unif01_CreateExternGen01 ("xoroshiro128plus_xd", xd_random_r8);
bbattery_SmallCrush (gen);
unif01_DeleteExternGen01 (gen);
...
Which is pretty dense code but basically runs the SmallCrush battery of tests on the double-precision floating-point values generated by xoroshiro128plus_xd.
The test subjects are:
The SmallCrush test is a shorter test suite, and all tests pass.
The Crush test is a longer test suite. Of these, only 3 tests fail with unlikely p-values (outside 0.001 to 0.999):
So, these RNGs are not so bad, accoring to TestU01.
Looking at the binary (or hexadecimal) representation of the floating point results, it’s pretty clear there are issues with the [1,2)-1 and the times-a-constant methods of creating floating-point numbers from integer random numbers.
Some of the least significant bits are either always zero, or mostly zero. For example, the frequency of bits in the “xtc” method gives
bit nbr actual expected
bit 0 2522555 5050050
bit 1 3786768 5050050
bit 2 4416094 5050050
bit 3 4733053 5050050
bit 4 4893247 5050050
bit 5 4969861 5050050
and the “xjr” method gives
bit 0 0 5050050
bit 1 2523599 5050050
bit 2 3785404 5050050
bit 3 4415926 5050050
bit 4 4734824 5050050
bit 5 4892002 5050050
meaning none of the lsbs are ever 1, and the next few bits are not one half the time as they should be. The number 5050050 is the expected number of ones.
The “good” method by Downey looks like
bit 0 5048674 5050050
bit 1 5048621 5050050
bit 2 5050834 5050050
bit 3 5048751 5050050
bit 4 5049737 5050050
bit 5 5049658 5050050
What is an objective, scientific method to measure this? Chi-square, of course. This test is for discrete distributions in large numbers. Here we have a mantissa of 23 bits (single-precision) or 53 bits (double-precision) which should all be independent and uniformly 1 or 0 half the time. We’ll run a Chi-square test of the distribution of the 1-bits in these mantissas.
And for good measure I added a test of half-octet, meaning 4 bits, which should always be averaging 7.5 in value.
The other measure is the frequency of exponents. This should be exponential, matching the concept that numbers ¼ the size have an exponent ¼ as likely. For example
Expon. Freq.
...
-20 98
-19 167
-18 388
-17 753
-16 1480
...
-4 6258023
-3 12508379
-2 25025356
-1 50049539
This is easily tested by taking log2 of the frequency and checking that log2(freq) is linear with the exponent.
(The above list of exponent frequencies is from od_random_r4 for 10G values. The smaller exponents -31 through -21 have been omitted.)
Most of the single-precision methods work fine, since they essentially are taking 64-bits (or 48 for the drand48 builtin) and using only 23 of them. Any issues with lsbs do not appear.
The acceptable methods for single-precision are the PCG and xoroshiro128+ and ++ RNGs with the Downey (d), jr (1-2]-1 method, and times-a-constant method. These also pass the half-octet tests.
Not surprisingly, for double-precision, only the Downey (d) methods pass the bits tests. The xoroshiro+ passes the half-octets tests, while PCG only about a third of the time. As the half-octet tests are biased (some bits have more influence on the average value of a half-octet) it’s not clear to me this is really a useful test.
Here are some same results for 20 tests of PCG and xoroshiro128+ for 100M numbers:
test chisqure-bits chisquare-halfoctets Note
od_...r8 20 pass 9 of 20 pass PCG
xd_...r8 20 pass 20 pass xoroshiro128+
This contrasts with integer64 tests, where fewer of both PCG and xoroshiro128+ tests passed the halfoctet tests. Rerunning those using the same seeds results in:
test chisqure-bits chisquare-halfoctets Note
pcg 20 pass 6 of 20 pass PCG
x...128+ 20 pass 4 pass xoroshiro128+
It is not surprising they differ; the floating point numbers are processed from the random integers by shifts and sometimes replacement.
One variant to be considered is a cryptologic transformation of the random result, before conversion to float, such as with AES128-CTR mode, to see if that changes the result. While slower, it is not better.
Statistical tests composed by statistical practitioners are not as complete as you might hope. Easy-to-check defects in RNGs are missed. And this has been true for years.
The practitioners have the wrong objectives in mind: generating statistics not generating usable random numbers.
Do the least-significant-bits of pseudo-random floating-point numbers matter? Especially the double-precision kind? Why, for some applications, yes they do. When taking differences, when making exact calculations locally and averaging over large spatial areas (such as particle-in-cell methods), when testing sensitivity of numerical models by masking (truncating values), and so on.
April 2020
Asahi Super Dry
Links:
Pierre L'Ecuyer website
xoshiro / xoroshiro generators and the PRNG shootout
OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics