After my last experience, I decided to explore the deep, wide and dangerous ocean of the random-number generation world, with floating-point representations added for flavour.
There are many subleties involved; the number of academics doing detailed stuff here is a good indication that there are many ways to do things and some of them are just plain wrong. And some are just plain good.
(No, not The Plan)
I would do stuff in C, and apply an F95 adaptation layer afterwards. This is because much of the bit-twiddling is easier or even just possible in C and not Fortran. Fortran only needs to be able to call this stuff.
Also, at this point I’m doing integration of the CUBEnu code. Which means testing components while checking that they connect correctly (such as accepting and computing valid values, adding or doing assumptions testing, some basic performance testing, and so on.) Therefore, some testing of the RNGs is required too.
Flang (at least) uses the default rand48() code base. The drand48() source code revealed some old, odd choices in generating pseudo- random numbers. Newer stuff exists, which I remember seeing occasionally. So a web-search engine approach to finding better code than drand48/rand48 would seem obviously the better choice than accepting the (clang/flang/gcc) status quo.
Of course this plan is written backwards.
Starting with Hacker News, I found several references to PCG, and thence the PCG website. I explored the fascinating worlds of numerous PRNG codes, including the testing, definitions of “random”, and so on.
Factoids that testing runs for a month on modern hardware were actually fascinating. Really!
Then, hunt for floating point conversions (both single and double). On Page 4 of Yahoo search, find the relevent references (Lemire, Occil) and thence to the listed Campbell and Downey codes.
OK, so I download the pcg-c-basic-0.9.zip and got it running.
I then got distracted trying to write a Fortran 2003 module. This meant fighting Fortran 2003 ideas of “pure” functions and the contrast with PRNGs which have side-effects by definition. Eventually I developed an F2003-compatible module that successfully calls PCG routines.
It also works under the GCC Fortran Co-Array Fortran language system, which is a bit mind-bending itself. But I get ahead of myself here.
OK, enough fun/distraction. Let’s write a test code, and see if some assumptions hold and which things are faster.
Since this is OpenBSD, we will start with arc4random(). Then, use both the Campbell and Downey methods to convert random bit patterns to REAL(4) and REAL(8). We also count the occurences of 0.0 and 1.0 (hopefully seeing no results of 1.0).
Also, use #ifdef to select the PCG method as it includes the capability to be reproducible. This is important for debugging, reproducibility, and reliability testing. The same code will compile either with PCG or arc4random.
(The following describes the resulting code. I’ve omitted the various missteps and outcome variants that I made in the meantime. Including discovering that the Downey code was for SPARC, then a big-endian machine.)
The Campbell code uses random64()
as the source of random numbers, so I
adopt that for this test program. Then adjust for arc4random:
#ifdef ARC4
uint64_t
random64(void) {
uint64_t x;
arc4random_buf(&x, 8);
return(x);
}
#endif
and for the PCG code:
#ifdef PCG
uint64_t
random64(void) {
static int pcginitialized = 0;
if (!pcginitialized) {
pcg32_srandom( arc4random(), arc4random() );
pcginitialized = 1;
}
return (((uint64_t)pcg32_random() << 32) + (uint64_t)pcg32_random() );
}
#endif
Similarly the Downey code uses random()
so adjust similarly:
#ifdef PCG
long
random(void) {
static int pcginitialized = 0;
if (!pcginitialized) {
pcg32_srandom( arc4random(), arc4random() );
pcginitialized = 1;
}
return pcg32_random();
}
#endif
(random() is arc4random by default on OpenBSD so a separate ARC4 code fragment is not needed.)
The test code is then a series of fragments that calculate a large number (like 10 billion) of pseudo-random floating-point numbers using both the Downey and Campbell methods. We also count the zeros and ones:
startclock(&btime);
nZeros = nOnes = 0;
for (j=0; j< NLOOP; j++)
for (i=0; i< NTEST; i++) {
rcandidate = random_real_64(); !! substitute functions here
if(rcandidate == 0) nZeros++;
if(rcandidate == 1.0) nOnes++;
}
stopclock(&btime, &timeperrand);
printf( REPORT, argv[0], "random_real_64",
nZeros, nOnes, 1e9 * timeperrand, 1e-6 / timeperrand );
And get this kind of result:
./test_rrng.xpcg my_random_float2 zeros 0 ones 0 time_ns 33.25 rate_M 30.08
and
./test_rrng.xarc4 random_real zeros 0 ones 0 time_ns 372.63 rate_M 2.68
This shows the two extremes: the Downey “my_random_float2” based on PCG is much quicker (30 million per second) than the Campbell “random_real” based on Arc4random (2 million per second). This is on my old Penryn.
Now that we have a fairly fast combination of code swiped from the web, let’s integrate it into a package. Let’s use the pcg_basic routines as a model:
...
// pcg32_random()
// pcg32_random_r(rng)
// Generate a uniformly distributed 32-bit random number
uint32_t pcg32_random(void);
uint32_t pcg32_random_r(pcg32_random_t rng);
...
So, add our own (fortran-ish) spellings of the O'Neill/Downey method:
// Downey method for float/double a.k.a. real(4)/real(8)
// od_random_r4()
// od_random_r4_r(rng)
// od_random_r8()
// od_random_r8_r(rng)
float od_random_r4(void);
float od_random_r4_r(pcg32_random_t rng);
double od_random_r8(void);
double od_random_r8_r(pcg32_random_t rng);
And a set of f90 files declaring the C-interface to these. They look like this:
subroutine pcg32_srandom(initstate, initseq) bind(c)
use iso_c_binding
implicit none
integer(C_INT64_T), value :: initstate
integer(C_INT64_T), value :: initseq
end subroutine pcg32_srandom
…and so on.
This will all be available on github eventually.
After the endian-ness crises, where I had to rewrite the bit twiddling bits for Downey, I was led to the notion of calculating [1,2) and subtracting 1. The link given in StackOverflow was stackoverflow 5015133 which led to LD Crocker’s ojrandlib. See the links below.
That too is added to my lib in my own versions for r4 and r8.
Note that this method is much less random at small values, such as less than 0.01 or less than 0.001. By “less random” I mean the mantissa has trailing zeros for small numbers. Which of course is not random. A small value might be 1.1920929e-07, which is 0x1p-23 or 1 “on” bit and 23 “off” bits.
The od_ routine small value might be 2.3952118e-10 or 0x1.075b38p-32, which clearly has more than one bit in the mantissa. Neither method has issues with larger numbers near 1.0.
Another thing to note: the od_ routines never produce 0.0. If you expect that you should see 0.0, prepare to be disappointed. Read the code and explain why it is possible but not observed.
December 2019
GH Mumm Cordon Rouge
Links
Taylor Campbell - Uniform Random Floats
Lee Daniel Crocker - ojrandlib
Allen B Downey - Generating Pseudo-random…
Daniel Lemire - How many floating-point numbers…
Peter Occil - Random Number Generation…
Melissa O'Neill - PCG, A Better Random Number Generator
Walker, Alastair J. - Fast generation of uniformly distributed pseudorandom numbers with floating-point representation. Electronics Letters 10 (1974): 533-534
OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics