CAF means CoArray Fortran, an extension to the Fortran-95 standard which provides some simplified parallelization capabilities to Fortran-95 (and later standards, of course).
It’s not totally simple, but it does provide:
The key idea is using specifically identified variables
to exchange data amongst the nodes, and not MPI
primitive routines. The specific nodes are indicated with square
brackets variable[(nodenumber)]
.
While the old joke about “you can write a Fortran program in any language” is sometimes true, you can’t write a CAF program in anything but Fortran. CAF is supported by GCC Fortran and some commercial compilers.
Don’t rely on me for this; I’m terrible at explaining stuff, as you can readily see. Look at the references below for intros to Fortran and CAF.
You need to add a new (as yet unsupported) package for this. This
package is pretty simple, see the link; untar into
/usr/ports/mystuff/lang
and cd to opencoarrays
and issue
a make package
, followed by an install. It requires: bash, openmpi,
cmake, and g95 to be installed.
What you get includes:
a caf
command which wraps gcc and library specifics. It’s
not very reliable and I don’t use it. The --show
option is
very useful to learn how building and linking works, however.
a cafrun
command which wraps the OpenMPI mpirun
command.
static and dynamic runtimes which are needed for executables
opencoarrays.mod
file for accessing some features from
Fortran: reductions, image number, total images.The GCC Fortran compiler includes the -fcoarray=
option,
where the value is either single
or lib
. “Single” uses the
libcaf_single library included with the fortran package. Specifying “lib” uses
whatever you provide which should be -lcaf_mpi provided with
this opencoarrays package.
Here is a sample program. The rnbasis
variable is dimensioned
automatically according to the number of images at runtime:
$ cat coblin.f90 << __end
program coblin
use pcg_basic
implicit none
type(pcg_state_setseq_64) :: rnbasis[*]
if ( this_image() .eq. num_images() ) then
write (*,*) "this_image()", this_image()
write (*,*) "this_image( rnbasis )", this_image( rnbasis )
write (*,*) "lcobound( rnbasis )", lcobound( rnbasis )
write (*,*) "ucobound( rnbasis )", ucobound( rnbasis )
write (*,*) "image_index(ucobound(rnbasis))", &
image_index( rnbasis, ucobound( rnbasis ) )
end if
end program coblin
__end
$ # my pcg lib is in my $HOME directory
$ il="-I$HOME/gcc/include -L$HOME/gcc/lib"
$ OMPI_FC=egfortran mpif90 \
-fcoarray=lib $il coblin.f90 -lpcg -lcaf_mpi
$ ./a.out
this_image() 1
this_image( rnbasis ) 1
lcobound( rnbasis ) 1
ucobound( rnbasis ) 1
image_index(ucobound(rnbasis)) 1
$ mpirun -np 2 -H localhost:2 ./a.out
this_image() 2
this_image( rnbasis ) 2
lcobound( rnbasis ) 1
ucobound( rnbasis ) 2
image_index(ucobound(rnbasis)) 2
This isn’t the prettiest output, but indicates the low bound and
upper bounds of the coarray rnbasis
, when run with 1 or 2 images.
Well, you can run multiple images but if you are doing RNG-based calculations you need to ensure each image get’s it’s own set of pseudo-random numbers. Otherwise their effort is duplicated.
Let’s check to see if we understand how:
use pcg_basic
implicit none
integer, parameter :: veclen = 6
integer :: perImage(veclen)
call pcg32_srandom( 10_8, 3_8 )
perImage=pcg32_random()
sync all
if(this_image() == 1) then
print *,' shallow assignment of pcg32_random()'
print *,perImage
endif
Here, “shallow” assignment means called pcg32_random() once and assign that value throughout all elements of the array perImage. Since this array is local to every image, every image has the same values in this array. This is not too interesting for parallelized programs: they would all do the same thing with identical RNGs.
Here we use a local variable for each image, and set it according to the image number:
type(pcg_state_setseq_64) :: singlebasis
type(pcg_state_setseq_64), allocatable :: rnbasis(:)[:]
allocate( rnbasis(num_images())[*] )
call pcg32_srandom_r( singlebasis, &
10_8, 3_8 * this_image() )
rnbasis(this_image())[1] = singlebasis
sync all
if(this_image() == 1) then
print *,'coarray assignment of rng basis for all images (unbuggy)'
print *, rnbasis(1:num_images())%state
print *, rnbasis(1:num_images())%inc
end if
This is a bit tricky, so let’s walk through one thought at a time:
singlebasis
is local to each image, and is initialized by the
pcg32_srandom_r routine to a value that depends on the image number
this_image().
The rnbasis
array is sized according to the number of images, and the
square bracket declaration [*] indicates it is a coarray used for exchanging
data between images.
The expression rnbasis(this_image())[1] =
will assign a value to
the rnbasis array located on image #1. The subscript this_image()
indicates which element of the array is to receive a value.
The rnbasis array(3) element receives a value from image #3, and so on.
The rnbasis array located on image #1 (rnbasis()[1]) receives the values.
Then image #1 prints the result. The arrays state and inc values will all be different. Right?
$ mpirun -np 3 -H localhost:3 ./co_vec.x
…
coarray assignment of rng basis for all images (unbuggy)
-2490148636861828604 -1198819441200173800 92509754461481004
7 13 19
If you feel like I just threw you into the deep end of the Co-Array Fortran swimming pool, you are right. See the links for a gentler introduction to CAF and getting some ordinary things done. Then you might find this RNG article more understandable.
January 2020 (revised November 2020)
Links
Parallel Programming with Fortran 2008 coarrays
Introduction to Fortran 90, Student notes
OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics