Experiences with Numerics on OpenBSD - 4

Experiences with Numerics on OpenBSD - 4 - CAF

CAF means CoArray Fortran, an extension to the Fortran-95 standard which provides some simplified parallelization capabilities to Fortran-95 (and later standards, of course).

It’s not totally simple, but it does provide:

multiple executions of the same program (“image”) under OpenMPI
exchanging or transferring scalars and arrays between images
data exchanges indicated with assignment statements
synchronization mechanisms

The key idea is using specifically identified variables to exchange data amongst the nodes, and not MPI primitive routines. The specific nodes are indicated with square brackets variable[(nodenumber)].

While the old joke about “you can write a Fortran program in any language” is sometimes true, you can’t write a CAF program in anything but Fortran. CAF is supported by GCC Fortran and some commercial compilers.

The basics

Don’t rely on me for this; I’m terrible at explaining stuff, as you can readily see. Look at the references below for intros to Fortran and CAF.

CAF on OpenBSD

You need to add a new (as yet unsupported) package for this. This package is pretty simple, see the link; untar into /usr/ports/mystuff/lang and cd to opencoarrays and issue a make package, followed by an install. It requires: bash, openmpi, cmake, and g95 to be installed.

What you get includes:

a caf command which wraps gcc and library specifics. It’s not very reliable and I don’t use it. The --show option is very useful to learn how building and linking works, however.
a cafrun command which wraps the OpenMPI mpirun command.
static and dynamic runtimes which are needed for executables
Fortran opencoarrays.mod file for accessing some features from Fortran: reductions, image number, total images.

The GCC Fortran compiler includes the -fcoarray= option, where the value is either single or lib. “Single” uses the libcaf_single library included with the fortran package. Specifying “lib” uses whatever you provide which should be -lcaf_mpi provided with this opencoarrays package.

Test Open Coarrays

Here is a sample program. The rnbasis variable is dimensioned automatically according to the number of images at runtime:

$ cat coblin.f90 << __end
program coblin
use pcg_basic
implicit none

    type(pcg_state_setseq_64) :: rnbasis[*]

if ( this_image() .eq. num_images() ) then
    write (*,*) "this_image()", this_image()
    write (*,*) "this_image( rnbasis )", this_image( rnbasis )
    write (*,*) "lcobound( rnbasis )", lcobound( rnbasis )
    write (*,*) "ucobound( rnbasis )", ucobound( rnbasis )
    write (*,*) "image_index(ucobound(rnbasis))", &
       image_index( rnbasis, ucobound( rnbasis ) )
end if
end program coblin
__end
$ # my pcg lib is in my $HOME directory
$ il="-I$HOME/gcc/include -L$HOME/gcc/lib"
$ OMPI_FC=egfortran mpif90 \
    -fcoarray=lib $il coblin.f90 -lpcg -lcaf_mpi
$ ./a.out
 this_image()           1
 this_image( rnbasis )           1
 lcobound( rnbasis )           1
 ucobound( rnbasis )           1
 image_index(ucobound(rnbasis))           1
$ mpirun -np 2 -H localhost:2 ./a.out
 this_image()           2
 this_image( rnbasis )           2
 lcobound( rnbasis )           1
 ucobound( rnbasis )           2
 image_index(ucobound(rnbasis))           2

This isn’t the prettiest output, but indicates the low bound and upper bounds of the coarray rnbasis, when run with 1 or 2 images.

RNG and coarrays

Well, you can run multiple images but if you are doing RNG-based calculations you need to ensure each image get’s it’s own set of pseudo-random numbers. Otherwise their effort is duplicated.

Let’s check to see if we understand how:

use pcg_basic
implicit none

integer, parameter :: veclen = 6
integer :: perImage(veclen)

call pcg32_srandom( 10_8, 3_8 )
perImage=pcg32_random()
sync all
if(this_image() == 1) then
    print *,' shallow assignment of pcg32_random()'
    print *,perImage
endif

Here, “shallow” assignment means called pcg32_random() once and assign that value throughout all elements of the array perImage. Since this array is local to every image, every image has the same values in this array. This is not too interesting for parallelized programs: they would all do the same thing with identical RNGs.

Here we use a local variable for each image, and set it according to the image number:

type(pcg_state_setseq_64) ::  singlebasis
type(pcg_state_setseq_64), allocatable :: rnbasis(:)[:]

allocate( rnbasis(num_images())[*] )

call pcg32_srandom_r( singlebasis, &
    10_8, 3_8 * this_image() )
rnbasis(this_image())[1] = singlebasis
sync all
if(this_image() == 1) then
    print *,'coarray assignment of rng basis for all images (unbuggy)'
    print *, rnbasis(1:num_images())%state
    print *, rnbasis(1:num_images())%inc
end if

This is a bit tricky, so let’s walk through one thought at a time:

singlebasis is local to each image, and is initialized by the pcg32_srandom_r routine to a value that depends on the image number this_image().
The rnbasis array is sized according to the number of images, and the square bracket declaration [*] indicates it is a coarray used for exchanging data between images.
The expression rnbasis(this_image())[1] = will assign a value to the rnbasis array located on image #1. The subscript this_image() indicates which element of the array is to receive a value.
The rnbasis array(3) element receives a value from image #3, and so on.
The rnbasis array located on image #1 (rnbasis()[1]) receives the values.
Then image #1 prints the result. The arrays state and inc values will all be different. Right?

$ mpirun -np 3 -H localhost:3 ./co_vec.x
…
coarray assignment of rng basis for all images (unbuggy)
-2490148636861828604 -1198819441200173800 92509754461481004
7 13 19

If you feel like I just threw you into the deep end of the Co-Array Fortran swimming pool, you are right. See the links for a gentler introduction to CAF and getting some ordinary things done. Then you might find this RNG article more understandable.

January 2020 (revised November 2020)

Glossary

image: In Co-Array Fortran, the executable running concurrently with other copies of the same program.
coarray: An array in Co-Array Fortran which can be used to exchange data between images.
reduction: An operation, such as sum() or max() producing one result from an array of numbers. Reductions may be calculated on arrays or coarrays.

Links

Parallel Programming with Fortran 2008 coarrays

Introduction to Fortran 90, Student notes