This is a hard one.
Adjusting the code for the cosmic initializer turned out to be pretty simple, but when compiled it resulted in a link time error:
ld: error: (....) relocation R_X86_64_PC32 out of range: (...) is not in [-2147483648, 2147483647]
Asking @misc resulted in the guidance that compiling with
-mcmodel=medium
is required. What does this mean? After
some research, it turns out:
-mcmodel=small
for the 2GB data+code model,
and -mcmodel=medium
for the 2GB code and 64-bit data model.OK, so rebuild with -mcmodel=medium and the link step works but the program fails at runtime with:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
The backtrace was 64 calls deep, and meaningless:
(gdb) bt
#0 0x00000b2e7c782d60 in ?? ()
#1 0x0000000000000004 in ?? ()
#2 0x00000b31188a09e0 in ?? ()
#3 0x0000000000000020 in ?? ()
#4 0x0000000000000020 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x2b485153b410927a in ?? ()
.... and so on
But when I ran similar (much smaller) test code, it worked.
So I went back to the failing code. Digging deeper into the backtrace required a static build to get a symbol table for the backtrace. And it was deep into the internals of fftw3. Hmm, nothing wrong with that code (by assumption). Here is what it looked like:
(gdb) bt
#0 0x00000efe25ec2f80 in recur () from /usr/local/lib/libfftw3f.so.7.1
#1 0x00000efe25ec2e8e in recur () from /usr/local/lib/libfftw3f.so.7.1
#2 0x00000efe25ec365a in zero () from /usr/local/lib/libfftw3f.so.7.1
#3 0x00000efe25e80dc4 in fftwf_measure_execution_time ()
from /usr/local/lib/libfftw3f.so.7.1
#4 0x00000efe25e7cd2b in evaluate_plan () from /usr/local/lib/libfftw3f.so.7.1
#5 0x00000efe25e7cb47 in search0 () from /usr/local/lib/libfftw3f.so.7.1
#6 0x00000efe25e7b5f3 in mkplan () from /usr/local/lib/libfftw3f.so.7.1
.... and so on
And this was 25 calls deep. Not the same issue. Or, perhaps it was.
Where was I? Linking works, but while running code the program segfaults. Some kind of pointer issue, perhaps? The Fortran code I was testing had no pointers.
It is very easy to make arrays larger than 2GB. But, in fact what the
initializer was doing was calling pencil_fft.f90
which had local
temporary arrays which were only 14M elements, or 56MB.
In Fortran, all statically defined variables end up in the same section. A “section” is a unit of memory allocation as defined by the linker and loader: it has a name, length, read/write attributes, etc. All object files are combined by the linker by matching up their sections and either concatenating them (such as for static variables or executable code ) or merging them (offset tables).
Programmers don’t need to know much about sections until they run into security issues (write XOR with execute, abbreviated W ^ X), or stack size issues, or as in this case, memory addressing issues.
The section here is the .bss section:
$ objdump -h pencil_fft.o
pencil_fft.o: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00002dd8 0000000000000000 0000000000000000 00000040 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 0000000000000000 0000000000000000 00002e18 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 000b4ca8 0000000000000000 0000000000000000 00002e20 2**5
ALLOC
where the 0x000b4ca8 size is 740520 bytes (decimal).
Looking into the whole program, the bss section was not too big. So, back to examine the code.
The fft routines declared input and output arrays as integer(C_INT), dimension(*), intent(in)
while the calling code was explicit dimensions dimension(number)
. Maybe that
was it?
The difference is whether the calling sequence delivers the array itself (as with explicit-shape), or some description of it (assumed-shape). Both the routine and the main code must agree otherwise you get dreaded Segmentation faults.
For example
$ cat >t.f90 << __end
program lbug
implicit none
save
real r3(ng,ng,ng) ! explicit shape, ng is a macro
print *,'hello bug'
call someexternal(r3,3)
print *,r3(3,3,3)
endprogram
__end
$ cat >tse.f90 << __end
subroutine someexternal(a,b)
real a(:,:,:) ! assumed shape
print *,'b is ',b
print *,a(1,1,1)
endsubroutine
__end
$ egfortran -c -cpp -Dng=20 t.f90
$ egfortran -c tse.f90
$ egfortran t.o tse.o
$ ./a.out
hello bug
b is 4.20389539E-45
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
A compile-time detection for this happens when both program and subroutine appear in the same file. In this case, you get
call someexternal(r3,3)
1
Error: Explicit interface required for 'someexternal' at (1): assumed-shape argument
To fix this, you can choose to modify the code to be explicit (passing the dimensions along with the array), or explain the interface to the compiler with an INTERFACE definition.
The code then looks like this:
program lbugt
implicit none
save
real, allocatable :: r3(:,:,:) ! deferred shape
interface
subroutine someexternal(x,y)
real, dimension (:,:,:) :: x
real :: y
end subroutine
end interface
print *,'hello bug'
allocate (r3(ng,ng,ng)) ! ng is a macro
call someexternal(r3,3.0)
print *,r3(3,3,3)
endprogram
The results look like:
$ egfortran -cpp -Dng=20 -c tb.f90
$ cat lbugp.f90
subroutine someexternal(a,b)
real a(:,:,:) ! assumed shape
print *,'b is ',b
print *,a(1,1,1)
endsubroutine
$ egfortran -c lbugp.f90
$ egfortran tb.o lbugp.o
$ ./a.out
hello bug
b is 3.00000000
0.00000000
0.00000000
$
This is one subtlety of the new Fortran language spec: arrays are a little more complicated that you might think.
February 2020
OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics