Improving Fortran Do Loop Performance by 25%

SUMMARY

Here’s a way to make sure you’re optimizing the writing of your code with an example in Fortran. It’s a neat, non-obvious trick to an engineer, but may be more obvious to a computer scientist. It involves the writing of a do loop where you are updating your value for an array, and instead of copying that array into a temporary variable, you simply use the programming logic to continue to use that information (location in memory) during every other time step. I was able to get 25% better performance for this simple change (and it scales). Not clear yet? Let me show you.

BACKGROUND
I’m a mediocre Fortran programmer but I’m learning new tricks with practice, challenges, and going to professor office hours. Taking Scientific Computing and High Performance Computing Systems right now is really increasing the strength of my programming skills. I hope to be an intermediate Fortran programmer by the end of the semester. Below is a sample from a code I wrote for a recent class project. The goal of the project was not to write a sequential version, so I feel comfortable posting a piece of the serial version.

CODE DESCRIPTION

The user is stepping in time from igen to gmax, calculating the neighbor value (‘naybs’) of a cell in the array ‘pop.’ Then, it updates the value of the cell in the array ‘pop’ for the next time step based on the neighbor values. But it does this update in ‘buffer,’ unnecessarily copying the data back to the array ‘pop.’ In this way, the loop proceeds without conflict. This works, but you may be sacrificing performance without even realizing it. I wrote EXAMPLE 1 below and then my instructor said “BUT WHY NOT DO IT BETTER?” and I went back to my computer and coded EXAMPLE 2.

EXAMPLE 1: Original Code

  do igen = 1, gmax
    naybs = 0
    buffer = 0
! This loop finds neighbor values
    do j = 2, y_limit + 1
      do i = 2, x_limit +1
        naybs(i,j) = pop(i-1,j-1) + pop(i,j-1) + pop(i+1,j-1)+ &
                     pop(i-1,j  )              + pop(i+1,j  )+ &
                     pop(i-1,j+1) + pop(i,j+1) + pop(i+1,j+1)
        ! Birth
        if (pop(i,j)==0 .and. naybs(i,j)==3) then
            buffer(i,j)=1
        ! Survival
        else if (pop(i,j)==1 .and. &
            (naybs(i,j)==2 .or. naybs(i,j)==3)) then
            buffer(i,j)=1
        ! Death
        else
            buffer(i,j)=0
        end if
        pop(i,j) = buffer(i,j)
      end do
    end do
  end do

EXAMPLE 2: Improved Code

do igen = 1, gmax
  naybs = 0

  if (mod(igen,2)==1) then

! This loop finds neighbor values
  do j = 2, y_limit + 1
    do i = 2, x_limit +1
      naybs(i,j) = pop(i-1,j-1) + pop(i,j-1) + pop(i+1,j-1)+ &
                   pop(i-1,j  )              + pop(i+1,j  )+ &
                   pop(i-1,j+1) + pop(i,j+1) + pop(i+1,j+1)
      ! Birth
      if (pop(i,j)==0 .and. naybs(i,j)==3) then
          buffer(i,j)=1
      ! Survival
      else if (pop(i,j)==1 .and. &
          (naybs(i,j)==2 .or. naybs(i,j)==3)) then
          buffer(i,j)=1
      ! Death
      else
          buffer(i,j)=0
      end if
    end do
  end do

  else if (mod(igen,2)==0) then

! This loop finds neighbor values
  do j = 2, y_limit + 1
    do i = 2, x_limit +1
      naybs(i,j)=buffer(i-1,j-1)+buffer(i,j-1)+buffer(i+1,j-1)+ &
                 buffer(i-1,j  )              +buffer(i+1,j  )+ &
                 buffer(i-1,j+1)+buffer(i,j+1)+buffer(i+1,j+1)
      ! Birth
      if (buffer(i,j)==0 .and. naybs(i,j)==3) then
          pop(i,j)=1
      ! Survival
      else if (buffer(i,j)==1 .and. &
          (naybs(i,j)==2 .or. naybs(i,j)==3)) then
          pop(i,j)=1
      ! Death
      else
          pop(i,j)=0
      end if
    end do
  end do
  end if
end do

I will say that you do have to write more code, but if that’s your hang-up, then you might not be very interested in performance in the first place.

More importantly, note the logic here. What I’m doing is identifying which time step is odd or even (by computing the mod of each time step), and then based on that result, I will update the next time step with a value from another grid. As far as I know, this can be done more elegantly in C by switching pointers. I don’t even know how to yet use pointers effectively in Fortran, so that might be possible here too. That would then give you the performance you desire, and the brevity that everyone likes. This result did increase my performance by 25%, which is dramatic if you are in the scientific computing world.

I note that this may be obvious to others, and they might even think that I made it harder on myself in the first place by doing something silly (which I did), but remember that I was taught this by multiple people, multiple times. Once I started to get an understanding of how data is held in memory, I started to make more advanced strides in programming. Hope this helps!

Timing an OpenMP run using Fortran

What to expect:

  • How you can time a run in Fortran by calling cpu_time
  • How you can time a run in Fortran (or C) using OpenMP using omp_get_wtime()

Compiler

I am using the Intel Fortran Compiler, ifort, called Intel Fortran Composer XE 2011 for Linux. It’s version 11.1. You can find the compiler here, as part of Intel’s Non-Commercial Software Downloads. You can check the version of your ifort by supplying the command

$ ifort -v

The ifort compiler has OpenMP capability built in. OpenMP has a built-in ability to time the run that you are executing. One way that we can time the run natively in Fortran will also be shown.

At the beginning of my code, it looks like the following:

Example Code

program goodtimes

c$     use omp_lib
       use your_modules

       implicit none

       double precision :: fstart, fend
       [Declare other variables]
       double precision :: ostart,oend

c      Fortran timing
       call cpu_time (fstart)

c      OpenMP timing
c$     ostart = omp_get_wtime()

c      Start of your meaningful code

c      Middle of your meaningful code

c      End of your meaningful code

c      End Fortran timing
       call cpu_time (fend)

c      End OpenMP timing
c$     oend = omp_get_wtime()

       write(*,*) 'Fortran CPU time elapsed', fend-fstart
c$     write(*,*) 'OpenMP Walltime elapsed', oend-ostart

end program

There are a few things to mention in the loosely-written code above. I wrote it a little Fortran 77-esque, where I started writing in the 7th column, and the OpenMP pragma is ‘c$’, ‘!$’, or ‘*$’. I used ‘c$’ above. In later versions of Fortran, use ‘!$’.

Notice that you must use the ‘omp_lib’ module in order to access the built-in ‘omp_get_wtime’. Otherwise, you will get an error. I strongly recommend making your start and end variables double precision. It doesn’t matter how you specify them as double precision, and I don’t necessarily recommend the way I did it above, but I just want to make it clear that you will have a better time with a double precision specification.

Note that cpu_time yields information about the CPU time (how long the CPU was working on your problem) and omp_get_wtime yields the wall clock time, such as the time that would have elapsed if you were timing the run from beginning to end with a very precise clock. I had a few runs for my application that showed the CPU operating at about 90% efficiency (where wall time is 100% of the total time). I recommend reading this post I did about profiling your code, so you can see which regions of your code are time consuming, and you can direct your OpenMP use in those regions.

Remember to include the ‘-openmp’ flag when compiling, and specifying the environment variable ‘OMP_NUM_THREADS’. I typically modify the ~/.bashrc file with a value and then source ~/.bashrc.

The rest of the code is self-explanatory. Read up on Fortran (or C) and OpenMP tutorials and other documentation for any additional information, or feel free to ask questions below. The C techniques are very similar and straightforward.

Installing gcc and gfortran for Mac OS X (10.7.3)

Things you’ll need:

  • Knowledge of how to use the terminal
  • An internet connection
  • A Mac developer account (you can get this as we go along)
  • Copy of Xcode (free)
  • About an hour of your time (30 minutes downloading, 15-30 minutes doing things)

Basic steps:

  1. Download and install Xcode
  2. Download command line tools
  3. Download and install gfortran from other source
Note that if you attempt to only download and install gfortran without gcc you might get the following error!

error trying to exec `as': execvp: No such file or directory

Also note that I performed this installation on a Macbook Air.

gcc

Download and install Xcode by clicking this link, or by searching for it in the Apple App Store, where it can be downloaded for free (see image).

Xcode contains gcc

After you’ve downloaded Xcode, you’ll want to open it and agree to their terms of service. Then, you’ll want to navigate to the menu Xcode –> Preferences –> Downloads. Here you’ll see an option to download Command Line Tools (see image). Note that you’ll need a developer account at this stage, and I was redirected to their developer page where I had to fill out a form and create my account (using my existing Apple ID, where a lot of the form was already auto-filled).

CL Tools

Download Command Line Tools from Preferences --> Downloads

After you have successfully installed the command line tools, open your terminal and type something like:

$ which gcc

which should return the path of your gcc in /usr/local/bin. All of this should have been taken care of automatically.
gfortran

I mentioned at the beginning that I got an error when attempting to use gfortran on my machine before I’d even installed gcc. I found that gcc must be installed in order to use gfortran. But my gfortran installation went smooth because it’s very straightforward.

Download gfortran from this link.

After considering my hardware, I chose the option:

Mac OS Lion (10.7) on Intel 64-bit processors (gfortran 4.6.2): download (released on 2011-10-20)

The installation has a walkthrough that comes with the package, like many Mac installations. Straightforward and it should also work automatically. Then, open your terminal and type

$ which gfortran

and it should reveal that it was successfully installed in /usr/local/bin.

Happy programming!

Profiling a simple Fortran code with gprof

I finished working through Chapman’s Introducton to Fortran 90/95, and it was a very interesting (helpful) read. My next step is to work through Chapman’s (no relation?) Using OpenMP, but there are some performance considerations I must first address.

Therefore, I looked into gprof, which is the GNU profiling tool. It will give me an understanding of how quickly my code runs, and which tasks in the workflow are taking up the most resources. Here is what the ifort man pages say about the gprof compiler flag (note that I have a 32-bit processor for this test!):

-p
Compiles and links for function profiling with gprof(1).
Architectures: IA-32, Intel® 64 architectures
Arguments: None
Default: OFF
Files are compiled and linked without profiling.
Description: This option compiles and links for function profiling with gprof(1).
Alternate Options:
Linux and Mac OS X: -pg (only available on systems using IA-32 architecture or Intel® 64 architecture), -qp (this is a deprecated option)

That’s interesting, sure! So with that bit of knowledge, I want to apply it to a large code that might make debugging a pain. I’m going to focus on a much simpler test case (that I’m taking from Chapman’s Fortran 90/95 book, Example 6-10, pg. 340).

gprof Example with Fortran Code

The example I consider has a function called “ave_value” which calculates the average value of a function between two points first_value and last_value. “ave_value” is called by “my_function,” which is declared as external in the test driver program “test_ave_value.” It’s a very simple program with three .f90 files.

I wrote these functions based on the example given in Chapman, and then  I compiled them with the following command:

$ ifort -p ave_value.f90 my_function.f90 test_ave_value.f90 -o test_ave_value

As a reminder, the -p flag allows me to specify our gprof option, and the -o flag allows me to rename the executable.

Now that you have your executable, you can simply run it, as I did:

$ ./test_ave_value

And you’ll notice that it has generated a “gmon.out” file that can be interpreted by gprof to show you your statistics! Writing gmon.out will overwrite any previous versions that you had in the folder, so use caution. Now, run gprof to interpret the gmon.out file.

$ gprof test_ave_value > tav.output

The tav.output was my re-naming of the gprof output. Now we can view the results of gprof in tav.output, in any competent text editor.

Looking at the Numbers

There is sufficient documentation for understanding gprof numbers on their website, but I’ll hit some critical points. The outputs are separated into the Flat Profile and the Call Graph. The Flat Profile conveys how much time your program has spent executing each function. The Call Graph conveys how much time was spent in the function and its children. You can read more here.

Visualization of gprof results

A quick way to put a visualization together (per the documentation of gprof2dot):

gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png

Here, gprof executes your program (which you’ve already compiled and linked with the appropriate flag!). That output is piped to a program called gprof2dot, which then pipes its output to create an output file that you can view in any competent image display tool!

Note that if you download gprof2dot, you’ll need to change the permissions to ensure that it’s an executable. I tried to run the non-executable version with

$ ./gprof2dot.py

but it would not execute because the file permissions were not set to executable.

Now that I learned this, I’m going to try it on a bigger code. Happy profiling!

Installing Python 2.7.2 in Ubuntu 11.10 – UNRESOLVED?

The bottom line: I have a working python installation because I installed it LOCALLY, but when I attempt a global (or system-wide) installation (using sudo), I run into an error I can’t seem to crack.

HISTORY

My goal is to install Python 2.7.2 so I can integrate it with my parallel workflow. It’s not a very ambitious goal. I installed the Intel C compiler (no sweat), MPICH2, and now I’m at this necessary step before I install mpi4py, a wonderful tool developed here.

SYSTEM INFO

During my installation process, I covered two versions of Ubuntu (because I updated midstream). The two versions were the notoriously friendly 10.04, and now 11.10. These are home editions, not server editions. I’ve done a lot of experimental stuff regarding the graphics on my 10.04, so now a lot of stuff is broken, and I thought upgrading to 11.10 would fix a lot of the harm I caused (it did!). I’m using a 64 bit Intel architecture, Corei7.

PACKAGE INFO

I downloaded the Python-2.7.2.tgz package. I unpacked it somewhere friendly. Then I built somewhere else. I usually do this and it has worked out pretty well so far.

COMMAND HISTORY

Here is a list of the commands that I issue. They should work and install Python, in theory.

Command 1
./configure –prefix=$INSTALL_DESTINATION CC=$INTEL_C_COMPILER 2>&1 | tee c.txt

Note that my $INSTALL_DESTINATION was only root accessible, meaning I needed to specify sudo when making any changes to that directory. I do the fancy tee because it is absolutely necessary to keep me from going mad. Printing a history of what I just did and when I did it is great bookkeeping. Keep in mind that I fiddled with my Intel compiler. I tried to use the 32 bit compilers but it wouldn’t configure. I wasn’t sure if that would help, and now I know it doesn’t.

Command 2
make 2>&1 | tee m.txt

Again, this is a simple command that will send its output to a text file. At this stage, I got some warnings. I did not format the warning for your reading enjoyment.

compilation aborted for /home/benjamin/Documents/installs/Python-2.7.2/Modules/_ctypes/libffi/src/x86/ffi64.c (code 2)

Python build finished, but the necessary bits to build these modules were not found:
_bsddb _sqlite3 _tkinter
bsddb185 bz2 dbm
dl gdbm imageop
readline sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the module’s name.
Failed to build these modules:
_bisect _codecs_cn _codecs_hk
_codecs_iso2022 _codecs_jp _codecs_kr
_codecs_tw _collections _csv
_ctypes _ctypes_test _curses
_curses_panel _elementtree _functools
_hashlib _heapq _hotshot
_io _json _locale
_lsprof _multibytecodec _multiprocessing
_random _socket _ssl
_struct _testcapi array
audioop binascii cmath
cPickle crypt cStringIO
datetime fcntl future_builtins
grp itertools linuxaudiodev
math mmap nis
operator ossaudiodev parser
pyexpat resource select
spwd strop syslog
termios time unicodedata
zlib

At the very least, we presume that our make install may not go as expected. Considering that I was about to install in a root directory, I issued the command

Command 3
$ sudo make install 2>&1 | tee mi.txt

which gave me the error

Traceback (most recent call last):
  File "/opt/Python-2.7/lib/python2.7/compileall.py", line 17, in <module>
    import struct
  File "/opt/Python-2.7/lib/python2.7/struct.py", line 1, in <module>
    from _struct import *
ImportError: No module named _struct
make: *** [libinstall] Error 1

From here, it’s been all suffering and confusion.

A LITTLE PROGRESS

I came across a forum post that detailed the error very similar to mine (if not exactly similar, but probably not, because it wasn’t their solution that solved my problem). They recommended an upgrade of the make utility. Before their recent upgrade in 2010, make was last upgraded in 2006. Four years of waiting reduced my “Failure to build these modules” from above, to this:

Failed to build these modules:

_ctypes

My recommendation here is to update your installation of make. Yet, with all of these wonderful improvements, my installation still failed with the same error. To check your existing version of make, go into your terminal window and type

$make -v

To automatically download the 3.8.2 edition of make, click here.

So with make upgraded, I was still running into issues with _struct. I did attempt the solution found in the link I provided to the forum posting. It did not work. But I think it’s a good start. I did not find the system-wide python installation absolutely necessary, so installing it locally was a breeze. I may come back to this later to resolve, but I’ll take any comments below to try them out.

Note: If you keep attempting the installation from source, make sure you run

$ make clean

after every time your

$ make install

fails, before you run make again.

Installing Numpy in Ubuntu 11.04 From Source

Download

In order to obtain the Numpy package, you’ll want to go to their website. Once you’re there you can download the source files. It’s also important to note that there are detailed Numpy installation instructions here. I’m just describing my experience.

Prerequisites

Because Numpy is a tool that requires Python…you’re going to need Python. If you’re using Ubuntu 11.04 and have not yet installed Python (or want to re-install a different version), I have a guide for that here.

Unpacking

I moved my source file to a place where I wanted to unpack it. In my case, I’ve made a local installation of Python, and I plan on making a local installation of Numpy. So let’s say I’m unpacking in:

$ mv numpy-1.6.1rc1 ~/opt
$ tar -xzvf numpy-1.6.1rc1.tgz
$ rm numpy-1.6.1rc1.tgz

What this will do is move my Numpy source tarball into my desired source directory, unpackage it, and then remove the undesired source tarball. I usually keep a back-up of my source for a while. You can always download the source again later if you want, I just like having it in case I can’t access the internet for some reason, or if I’m installing on a machine that I’m accessing through ssh.

Building

Here is where your installation of Python will service you. Python takes setup.py files that you’ll typically see in source directories and executes the code. In the case of Numpy, you’ll issue this command from the source directory.

$ python setup.py build

Now here’s where the fun starts. Numpy has a lot of options. I’m only going to address a select few and make you aware of others. If optimization is your interest, you’ll want to check out the options in the site.cfg file so that you can configure ATLAS or BLAS. They’re not necessary and I have not tested their effectiveness. But they are widely used in the field of scientific computing and I highly recommend them!

I’m also going to be using the Intel compilers. That’s a link for instruction provided by the SciPy folks themselves! I chose these options when configuring my build:

$ python setup.py build –fcompiler=intel –compiler=intel build_clib –compiler=intel build_ext

I specified both the Intel C and Fortran compilers. There are two steps to this process. The first is the build, the second is the install.

Installation

Once you have completed your Numpy build, it’s time to install. This is the command I used.

$ python setup.py install –prefix=/home/ben/opt/Python-2.7.2./

This is curious for several reasons. As a new user, when I’d typically run configure (from the world of make files), I felt that setting prefixes (which set the installation directory) would go into the configure step. But python does things a little differently. The ‘configure’ step doesn’t really have a true analog with Python build and install. We specify the installation directory during install!

Notice that I also installed Numpy in the location of my Python installation. The binaries from the installation end up joining my Python binaries. Now, if you followed this tutorial, you’ll notice that you can just type

$ which f2py

from the command line after installation and see that it is in your path! This is because I installed Numpy where Python is installed. Similar to Python, the Numpy executable ‘f2py’ is in $INSTALL_DIR/bin. If you didn’t use the tutorial, no worries! You would adjust your PATH variable similarly as I described in the tutorial, regardless of your installation directory. There is a lot of documentation out there on modifying your PATH if you still have questions. But I highly recommend using the same location as your Python installation. You may have to make some path adjustments if they are different!

Now that f2py is installed and your PATH is configured, try running Python.

$ python
>>> import numpy

If you can successfully import Numpy, you’re almost in business! Start looking for examples to run and get started! If you can pass tests, you can start the really exciting stuff!

Installing Python 2.7.2 in Ubuntu 11.04

Preface: In this example, $ is the prompt. So you don’t type the dollar sign. It’s just a sign that precedes “this is what I typed into my computer terminal”. These instructions are for beginners. Experts are welcome to chime in.

What’s nice about this guide is that because I specify instructions for a local installation, it can be applied to many types of machines (like compute clusters where you do not have root access).

Beginning the Installation

I just installed the Python 2.7.2 software package locally on my desktop. I’m running Ubuntu Natty Narwhal 11.04. I say “locally” because I’m installing it in my home directory, not system-wide. This is useful when you’re not root on your machine or do not have root authority.

Because Python is already installed on my machine, when I issue the command:

$ which python

to find the directory where Python is executed from, I get something like

/usr/bin/python

But I want to install Python in a new, local directory. So I created a directory.

$ mkdir ~/opt

Then I extracted my Python tar.gz file into this directory.

$ tar -xzvf Python-2.7.2.tgz

I went into this directory to poke around.

$ cd ~/opt/Python-2.7.2

Customizing Your Installation

I read the INSTALL.txt file for information about how to customize my installation. I found the options I want. In order to install Python locally, I specify my “prefix” during the configure step. Prefix is basically the location of the installation directory. Right now you’re in the source directory because that’s where the source code is located. I also want to specify my custom C compiler. I use the Intel compiler because it is pretty awesome. Its location on my machine is:
/opt/intel/composerxe-2011.4.191/bin/ia32/icc
Let’s not confuse the two different locations on my machine. Ubuntu has already set up an /opt directory. But I later created a /home/ben/opt directory because that’s what I wanted. You’ll notice I like to use the ‘~’ symbol to represent $HOME. You can find your value of $HOME by typing:

$ echo $HOME

Configure Python

Anyway, it’s time to configure Python. I had two goals: 1) Install it locally and 2) Specify my C compiler (so it doesn’t default to gcc). Experts know there are several ways of doing this without what I’m about to show you. This tutorial is not for experts. In your terminal you’ll need to be in the Python installation directory where you unpackaged your files to issue this command.

$ ./configure –prefix=~/opt/Python-2.7.2 CC=/opt/intel/composerxe-2011.4.191/bin/ia32/icc

Now your machine should configure Python. This may take up to a few minutes. The next step is exciting. Your version of Python will be compiled when you issue the next steps.

Python Make

$ make && make install

Now be aware that ‘make’ and ‘make install’ are two separate commands. I just used ‘&&’ to issue them on one line for convenience, so they execute back to back without my further instruction (because ‘make’ and ‘make install’ can sometimes take a while).

*NOTE* If you previously attempted this step and screwed up, or wrote something you didn’t want to in the configure step, you’ll have to reissue your ‘configure’ with the appropriate options and then:

$ make clean

before you continue with ‘make’ and ‘make install’.

Assigning Your PATH to Find Python

After you finish ‘make install’, Python should finish its compilation. Now I want to do a little organizing. I’d like to be able to execute Python by typing:

$ python

from a fresh terminal window. Right now, if I try to execute Python that way, it launches from /usr/local/bin as I mentioned earlier. This is not what I desire! We can change this by adjusting our path. We’ll go into the .bashrc file and edit with vim. Of course, you’re free to edit with whatever editor you like.

$ cd ~
$ vim .bashrc

Now once we’re viewing the file, get in insert mode and scroll down to assign your path. My path line looks like this after my edits:

PATH=”/home/ben/opt/Python-2.7.2/bin:$PATH”

I put my installation directory that I specified during prefix into my path. It’s represented by /home/ben/opt/Python-2.7.2/. But what’s ‘bin’ have to do with this (very few of you ask…but some may)?

Two important things to note. If I put $PATH at the beginning of my quoted statement, it wouldn’t have found my desired Python. That’s because your machine searches for the first result in your path. That would have been in /usr/local/bin (or whatever else you previously specified). So I had to drag $PATH to the rear of my line. Also, Python is not executed from the directory

~/opt/Python-2.7.2

but from

~/opt/Python-2.7.2/bin

The $INSTALL_DIR/bin is where the Python binary is located. That’s the typical location for binaries.

Final Steps

Now save your .bashrc file. Close your terminal window and open a new one. Type:

$ which python

It should be the location of the Python you just installed.

$ python -V

This will tell you the version of Python you’re running. Now just type ‘python’ to get started!

$ python