SUMMARY
Here’s a way to make sure you’re optimizing the writing of your code with an example in Fortran. It’s a neat, non-obvious trick to an engineer, but may be more obvious to a computer scientist. It involves the writing of a do loop where you are updating your value for an array, and instead of copying that array into a temporary variable, you simply use the programming logic to continue to use that information (location in memory) during every other time step. I was able to get 25% better performance for this simple change (and it scales). Not clear yet? Let me show you.
BACKGROUND
I’m a mediocre Fortran programmer but I’m learning new tricks with practice, challenges, and going to professor office hours. Taking Scientific Computing and High Performance Computing Systems right now is really increasing the strength of my programming skills. I hope to be an intermediate Fortran programmer by the end of the semester. Below is a sample from a code I wrote for a recent class project. The goal of the project was not to write a sequential version, so I feel comfortable posting a piece of the serial version.
CODE DESCRIPTION
The user is stepping in time from igen to gmax, calculating the neighbor value (‘naybs’) of a cell in the array ‘pop.’ Then, it updates the value of the cell in the array ‘pop’ for the next time step based on the neighbor values. But it does this update in ‘buffer,’ unnecessarily copying the data back to the array ‘pop.’ In this way, the loop proceeds without conflict. This works, but you may be sacrificing performance without even realizing it. I wrote EXAMPLE 1 below and then my instructor said “BUT WHY NOT DO IT BETTER?” and I went back to my computer and coded EXAMPLE 2.
EXAMPLE 1: Original Code
do igen = 1, gmax
naybs = 0
buffer = 0
! This loop finds neighbor values
do j = 2, y_limit + 1
do i = 2, x_limit +1
naybs(i,j) = pop(i-1,j-1) + pop(i,j-1) + pop(i+1,j-1)+ &
pop(i-1,j ) + pop(i+1,j )+ &
pop(i-1,j+1) + pop(i,j+1) + pop(i+1,j+1)
! Birth
if (pop(i,j)==0 .and. naybs(i,j)==3) then
buffer(i,j)=1
! Survival
else if (pop(i,j)==1 .and. &
(naybs(i,j)==2 .or. naybs(i,j)==3)) then
buffer(i,j)=1
! Death
else
buffer(i,j)=0
end if
pop(i,j) = buffer(i,j)
end do
end do
end do
EXAMPLE 2: Improved Code
do igen = 1, gmax
naybs = 0
if (mod(igen,2)==1) then
! This loop finds neighbor values
do j = 2, y_limit + 1
do i = 2, x_limit +1
naybs(i,j) = pop(i-1,j-1) + pop(i,j-1) + pop(i+1,j-1)+ &
pop(i-1,j ) + pop(i+1,j )+ &
pop(i-1,j+1) + pop(i,j+1) + pop(i+1,j+1)
! Birth
if (pop(i,j)==0 .and. naybs(i,j)==3) then
buffer(i,j)=1
! Survival
else if (pop(i,j)==1 .and. &
(naybs(i,j)==2 .or. naybs(i,j)==3)) then
buffer(i,j)=1
! Death
else
buffer(i,j)=0
end if
end do
end do
else if (mod(igen,2)==0) then
! This loop finds neighbor values
do j = 2, y_limit + 1
do i = 2, x_limit +1
naybs(i,j)=buffer(i-1,j-1)+buffer(i,j-1)+buffer(i+1,j-1)+ &
buffer(i-1,j ) +buffer(i+1,j )+ &
buffer(i-1,j+1)+buffer(i,j+1)+buffer(i+1,j+1)
! Birth
if (buffer(i,j)==0 .and. naybs(i,j)==3) then
pop(i,j)=1
! Survival
else if (buffer(i,j)==1 .and. &
(naybs(i,j)==2 .or. naybs(i,j)==3)) then
pop(i,j)=1
! Death
else
pop(i,j)=0
end if
end do
end do
end if
end do
I will say that you do have to write more code, but if that’s your hang-up, then you might not be very interested in performance in the first place.
More importantly, note the logic here. What I’m doing is identifying which time step is odd or even (by computing the mod of each time step), and then based on that result, I will update the next time step with a value from another grid. As far as I know, this can be done more elegantly in C by switching pointers. I don’t even know how to yet use pointers effectively in Fortran, so that might be possible here too. That would then give you the performance you desire, and the brevity that everyone likes. This result did increase my performance by 25%, which is dramatic if you are in the scientific computing world.
I note that this may be obvious to others, and they might even think that I made it harder on myself in the first place by doing something silly (which I did), but remember that I was taught this by multiple people, multiple times. Once I started to get an understanding of how data is held in memory, I started to make more advanced strides in programming. Hope this helps!