The Nim language has a couple of interesting features:
One typical example of numerical integration is the calculation of pi by approximation, specifically by estimation of area of a circle and area of a square.
area of circle = pi * r * r
area of enclosing square is (2*r)*(2*r)
select a random point
if a point is inside the circle then increment n
m = total number points
pi = 4 * n/m, give or take
This is an example of numerical integration but only on two dimensions. Usually this method is used for many more dimensions. Also, you often don’t have an absolute truth to compare your answer against!
The core of an implementation in Nim looks like
proc integpi(rand: Rand): float =
var r = rand
var randx, randy: float
var ctr: float = 0
for i in 0..<Nlim:
randx = float(r.next and int32bits)
randy = float(r.next and int32bits)
if ((randx*randx + randy*randy) < float(int32max)*float(int32max)):
ctr = ctr + 1
return ctr
Here we use 64-bit random integers masked to 32 bits. Then compare the circle boundary x * x + y * y < r * r. Pi is then 4*ctr/Nlim. This code uses floating point because that happened to be a quick way to avoid some Nim value conversion issues by this novice Nim programmer. It is a bit slower than using unsigned int64s throughout.
This is also an example of something that is embarrasingly parallel. You could run independent threads of integpi, and as long as the random number generators are independent, add up the results of each thread. Up to the limit of the density of random numbers in this 32-bit space of numbers, that is.
The threading model in Nim allows basic threads, and the various lock functions allow to synchronize threads.
An interesting feature of threads is they are non-deterministic. They are serial, but the priority and speed of each thread is up to the operating system. If you oversubscribe your CPUs, then you cannot expect anything other than eventual completion.
For example, on 2-core Penryn:
epi64timed.nim using 2 threads x 10100100100 loops
Thread report:
|***************************************************|
|***************************************************|
pi 3.141592653589793 estimated 3.141586181705268 diff -6.471884524739124e-06
The stars begin at earliest thread start time, and end at latest thread completion. Here it is no surprise both threads are concurrent.
epi64timed.nim using 8 threads x 10100100100 loops
Thread report:
|**************************************** |
|************** |
|*************************************** |
|************************************* |
| ************* |
| ************** |
| ***************|
| *************|
pi 3.141592653589793 estimated 3.141588725343425 diff -3.928246368190003e-06
Now that is an 8-thread program on a 2-core machine. Obviously some threads don’t start until later. These are listed in order of thread creation but there is no guarantee this pattern will repeat.
What about Linux, on a 4-core Raspberry Pi?
Here is that result:
pi 3.141592653589793 estimated 3.141593079161661 diff 4.255718679679887e-07
epi64timed.nim using 8 threads x 10100100100 loops
Thread report:
|************************** |
|************************** |
|************************** |
|************************** |
| **************************|
| **************************|
| **************************|
| **************************|
pi 3.141592653589793 estimated 3.141588725343425 diff -3.928246368190003e-06
This pattern tends to be the case for several Linux distributions, where four or so threads are run, they all complete, and the next four threads are run. I see a similar pattern on Mint(Ubuntu) on the Penryn 2-core machine.
Here is the code, annotated, from beginning to end, showing how pi is calculated and how the star-bars are calculated and printed.
# epi64timed.nim
# estimate pi with threads
# use uint64 wherever possible
# report relative thread timings graphically
import math, os, random, strutils, threadpool
import std/monotimes, times
{.experimental: "parallel".}
This code uses various standard modules from Nim. The monotimes module is for timing. It defines the MonoTime type used for Durations. This code uses the still-experimental parallel features of Nim
const realpi = 4.0*math.arctan(1.0)
const Nlim = 10100100100
const int32max = high(int32)
const int32bits = cast[uint64](int32max)
type
intresult = tuple
ctr: uint64
btime, etime: MonoTime
var mypi: float
var nthreads: int
Here we define the truth value of pi, a loop count for each thread, a couple of constants for our 2n x 2n box, where a box is 2 x int32max on a side.
To be clear, high(int32) is a 31-bit number so “int32max” is a 31-bit number too. We also declare a value for the estimate, and a tuple defining the result of a calculation. The tuple includes begin-time and end-time of each thread.
proc integpi(rand: Rand): intresult =
var r = rand
var randx, randy: uint64
var ctr: uint64 = 0
result.btime = getMonoTime()
for i in 0..<Nlim:
randx = r.next and int32bits
randy = r.next and int32bits
if ((randx*randx + randy*randy) < int32bits*int32bits): ctr = ctr + 1
result.etime = getMonoTime()
result.ctr = ctr
This is an implementation of the algorithm above, in uint64 with 31-bit ints randomly sprinkled around a box. The variables are not named well.
proc estimatepi(t: int): float =
var numers = newSeq[intresult](t+2) # 0 not used
var sumnumerator: float = 0.0
var rstate = initRand(23415332)
parallel:
for k in 1..t:
numers[k] = spawn integpi(rstate)
rstate.skipRandomNumbers() # skip 2^64 results
for k in 1..t:
sumnumerator = sumnumerator + float(numers[k].ctr)
This is the parallel code section calling the integpi routine once per thread. The interesting bits for non-Nim programmers are:
Here is the code for the star-bars:
block:
var mint, maxt: MonoTime
var linsp: string
var lspace, espace: int
var difft: float
mint=numers[1].btime
maxt=numers[1].etime
for k in 1..t:
mint=min(numers[k].btime,mint)
maxt=max(numers[k].etime,maxt)
difft=float((maxt-mint).inMilliseconds)
echo " Thread report:"
for k in 1..t:
linsp=" ".repeat(51)
lspace=toInt(50*float((numers[k].btime-mint).inMilliseconds)/difft)
espace=toInt(50*float((numers[k].etime-mint).inMilliseconds)/difft)
for i in lspace..espace: linsp[i] = '*'
echo "|",linsp,"|"
This is a block that prints out the text-based graphs showing relative thread begin/end times. There is no scale factor or actual numbers reported; the intent is to show relative start/stop times, not actual values.
I suspect this code can be simplified to eliminate loops, such as the mint and maxt calculation or placing ‘*’ characters.
Nim is an interesting language with useful features (careful type checking; spaces for grouping and defining implicit blocks; object-oriented features; generics; introspection of the compiler during compilation, etc etc).
$ nim c -d:deploy epi64timed.nim
$ ./epi64timed
May 2020
Asahi Super Dry
Links:
Nim Language site
Nim random module
OpenBSD Numerics Experience - 1 - RNG
OpenBSD Numerics Experience - 2 - RNG floats
OpenBSD Numerics Experience - 3 - FFTW
OpenBSD Numerics Experience - 4 - CAF
OpenBSD Numerics Experience - 5 - MPI Networking
OpenBSD Numerics Experience - 6 - Memory Models
OpenBSD Numerics Experience - 7 - Python Image Display
OpenBSD Numerics Experience - 8 - RNGs, again
OpenBSD Numerics Experience - 9 - Nim
OpenBSD Numerics Experience - A - Graphical Display
OpenBSD Numerics Experience - B - ParaView
OpenBSD Numerics Experience - C - Numerical Debugging
OpenBSD Numerics