Experiences with Numerics on OpenBSD - 4 - CAF

Experiences with Numerics on OpenBSD - 4 - CAF

CAF means CoArray Fortran, an extension to the Fortran-95 standard which provides some simplified parallelization capabilities to Fortran-95 (and later standards, of course).

It’s not totally simple, but it does provide:

The key idea is using specifically identified variables to exchange data amongst the nodes, and not MPI primitive routines. The specific nodes are indicated with square brackets variable[(nodenumber)].

While the old joke about “you can write a Fortran program in any language” is sometimes true, you can’t write a CAF program in anything but Fortran. CAF is supported by GCC Fortran and some commercial compilers.

The basics

Don’t rely on me for this; I’m terrible at explaining stuff, as you can readily see. Look at the references below for intros to Fortran and CAF.

CAF on OpenBSD

You need to add a new (as yet unsupported) package for this. This package is pretty simple, see the link; untar into /usr/ports/mystuff/lang and cd to opencoarrays and issue a make package, followed by an install. It requires: bash, openmpi, cmake, and g95 to be installed.

What you get includes:

The GCC Fortran compiler includes the -fcoarray= option, where the value is either single or lib. “Single” uses the libcaf_single library included with the fortran package. Specifying “lib” uses whatever you provide which should be -lcaf_mpi provided with this opencoarrays package.

Test Open Coarrays

Here is a sample program. The rnbasis variable is dimensioned automatically according to the number of images at runtime:

$ cat coblin.f90 << __end
program coblin
use pcg_basic
implicit none

    type(pcg_state_setseq_64) :: rnbasis[*]

if ( this_image() .eq. num_images() ) then
    write (*,*) "this_image()", this_image()
    write (*,*) "this_image( rnbasis )", this_image( rnbasis )
    write (*,*) "lcobound( rnbasis )", lcobound( rnbasis )
    write (*,*) "ucobound( rnbasis )", ucobound( rnbasis )
    write (*,*) "image_index(ucobound(rnbasis))", &
       image_index( rnbasis, ucobound( rnbasis ) )
end if
end program coblin
__end
$ # my pcg lib is in my $HOME directory
$ il="-I$HOME/gcc/include -L$HOME/gcc/lib"
$ OMPI_FC=egfortran mpif90 \
    -fcoarray=lib $il coblin.f90 -lpcg -lcaf_mpi
$ ./a.out
 this_image()           1
 this_image( rnbasis )           1
 lcobound( rnbasis )           1
 ucobound( rnbasis )           1
 image_index(ucobound(rnbasis))           1
$ mpirun -np 2 -H localhost:2 ./a.out
 this_image()           2
 this_image( rnbasis )           2
 lcobound( rnbasis )           1
 ucobound( rnbasis )           2
 image_index(ucobound(rnbasis))           2

This isn’t the prettiest output, but indicates the low bound and upper bounds of the coarray rnbasis, when run with 1 or 2 images.

RNG and coarrays

Well, you can run multiple images but if you are doing RNG-based calculations you need to ensure each image get’s it’s own set of pseudo-random numbers. Otherwise their effort is duplicated.

Let’s check to see if we understand how:

use pcg_basic
implicit none

integer, parameter :: veclen = 6
integer :: perImage(veclen)

call pcg32_srandom( 10_8, 3_8 )
perImage=pcg32_random()
sync all
if(this_image() == 1) then
    print *,' shallow assignment of pcg32_random()'
    print *,perImage
endif

Here, “shallow” assignment means called pcg32_random() once and assign that value throughout all elements of the array perImage. Since this array is local to every image, every image has the same values in this array. This is not too interesting for parallelized programs: they would all do the same thing with identical RNGs.

Here we use a local variable for each image, and set it according to the image number:

type(pcg_state_setseq_64) ::  singlebasis
type(pcg_state_setseq_64), allocatable :: rnbasis(:)[:]

allocate( rnbasis(num_images())[*] )

call pcg32_srandom_r( singlebasis, &
    10_8, 3_8 * this_image() )
rnbasis(this_image())[1] = singlebasis
sync all
if(this_image() == 1) then
    print *,'coarray assignment of rng basis for all images (unbuggy)'
    print *, rnbasis(1:num_images())%state
    print *, rnbasis(1:num_images())%inc
end if

This is a bit tricky, so let’s walk through one thought at a time:

If you feel like I just threw you into the deep end of the Co-Array Fortran swimming pool, you are right. See the links for a gentler introduction to CAF and getting some ordinary things done. Then you might find this RNG article more understandable.

January 2020 (revised November 2020)

Glossary

image
In Co-Array Fortran, the executable running concurrently with other copies of the same program.
coarray
An array in Co-Array Fortran which can be used to exchange data between images.
reduction
An operation, such as sum() or max() producing one result from an array of numbers. Reductions may be calculated on arrays or coarrays.

Links

Parallel Programming with Fortran 2008 coarrays

Introduction to Fortran 90, Student notes


OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics