Windows
Fortran Execution Time Benchmarks

Updated 11/2010

Fortran Execution Time Benchmarks - 64 bit Windows 7 on Intel Core i7 920

Absoft
11.1
FTN95
5.50
g95
0.92
Intel
12.0
Lahey
7.20.0
NAG
5.2
PGI
10.9
AC 7.39 15.69 17.16 10.78 16.55 20.26 10.41
AERMOD 16.17 34.19 42.31 14.81 23.28 37.10 16.39
AIR 2.44 9.37 9.42 2.85 7.67 7.16 6.03
CAPACITA 29.55 65.76 46.08 28.60 47.72 50.20 31.97
CHANNEL 2.51 5.51 9.34 1.67 4.15 3.08 2.43
DODUC 28.57 46.05 35.07 31.16 30.83 39.82 26.85
FATIGUE 5.12 18.26 45.08 12.67 12.95 10.60 7.66
GAS_DYN 1.83 25.69 26.85 3.62 10.35 19.88 4.23
INDUCT 9.98 79.62 36.96 8.69 68.94 41.90 28.38
LINPK 8.32 8.38 9.50 8.22 7.59 8.74 8.34
MDBX 11.05 20.79 15.14 10.75 14.78 13.87 13.25
NF 12.35 24.57 25.66 10.62 19.33 17.30 13.33
PROTEIN 30.96 54.52 52.40 30.76 55.26 45.96 34.91
RNFLOW 16.33 30.87 31.09 19.24 24.05 35.96 26.33
TEST_FPU 8.54 16.28 16.62 5.66 10.34 10.95 6.51
TFFT 2.31 3.17 2.73 2.18 2.39 2.63 2.29
Geometric Mean 8.43 20.73 20.83 9.00 15.57 16.47 10.90

Compiler Switches
Absoft

af90 -m64 -O5 -speed_math=10 -fast_math -march=core -xINTEGER -stack:0x8000000

FTN95 ftn95 /p6 /optimize (slink was used to increase the stack size)
g95 g95 -march=nocona -ffast-math -funroll-loops -O3
gfortran gfortran -O3 -march=native -funroll-loops -ffast-math
Intel ifort /O3 /Qipo /QxHost /Qprec-div /Qparallel /link /stack:64000000
Lahey lf95 -inline (35) -o1 -sse2 -nstchk -tp4 -ntrace -unroll (6) -zfm
NAG f95 -O4 -V
PGI pgf90 -V -fastsse -Munroll=n:4 -Mipa=fast,inline

Fortran Execution Time Benchmarks - 64-bit Windows 7 on AMD Phenom II

Absoft
11.1
FTN95
5.50
g95
0.92
Intel
12.0
Lahey
7.2
PGI
10.9
AC 8.66 15.98 15.33 8.63 18.08 9.98
AERMOD 18.84 32.54 40.97 16.66 26.34 18.17
AIR 3.14 11.50 9.74 3.29 9.84 7.42
CAPACITA 35.86 65.20 58.80 38.36 63.86 36.06
CHANNEL 3.75 7.25 8.21 3.99 4.96 3.96
DODUC 30.23 48.90 38.13 30.91 37.24 27.04
FATIGUE 4.83 20.26 46.47 14.91 13.17 8.19
GAS_DYN 2.48 21.23 22.53 3.39 9.24 5.18
INDUCT 10.38 86.59 48.31 7.31 68.61 25.35
LINPK 10.95 11.00 11.24 10.50 10.51 10.61
MDBX 12.27 23.94 20.45 10.73 19.53 14.10
NF 16.10 28.41 31.22 13.87 24.69 14.72
PROTEIN 31.28 55.65 56.56 30.71 57.53 35.02
RNFLOW 15.88 35.85 33.67 19.47 28.77 30.55
TEST_FPU 10.35 17.39 17.95 8.12 11.29 8.58
TFFT 4.19 5.45 4.61 4.02 4.06 4.41
Geometric Mean 10.12 23.25 23.07 10.60 18.43 12.77

 

Compiler Switches
Absoft

af90 -m64 -O5 -speed_math=10 -fast_math -march=barcelona -xINTEGER -stack:0x8000000

g95 g95 -march=opteron -ffast-math -funroll-loops -O3
gfortran gfortran -march=native -ffast-math -funroll-loops -O3
Intel ifort /O3 /Qipo /QxHost /Qprec-div- /Qparallel /link /stack:64000000
Lahey lf95 --fast -static -x -
PGI pgf90 -Bstatic -V -fastsse -Munroll=n:4 -Mipa=fast,inline
Sun sunf95 -fast -xtarget=native
 
Notes  
 

All figures are Execution Times in Seconds - measured on a measured on a machine with an AMD Phenom II X4 955 processor (3.2 GHz), running Windows 7 64-bit. Each figure is the average over at least 10 runs (many more for some). Measurement error is typically <1%. Green cells highlight figures within 10% of the fastest. Red cells indicate figures which are more than 150% of the fastest.

So far as possible, we have used the compiler switches which give the best overall results. We have not attempted to tune individual benchmarks, and, in particular cases, different switch settings may give better results. We have created and used 64 bit executables where possible, and 32 bit executables where the compiler does not offer a 64 bit option.

The settings used for the Intel and Absoft compilers enable autoparallelization. Autoparallelization settings are not used on any other compilers because we found that they produced no significant performance benefits on this benchmark set.

Thanks are due to Jos Bergervoet for permission to use his CAPACITA benchmark, to Quetzal Associates for permission to use their CHANNEL, FATIGUE, GAS_DYN, INDUCT, PROTEIN and RNFLOW benchmarks, to David Frank for his TEST_FPU benchmark, and to Ted Addison of McVehil-Monnett Associates for permission to use AERMOD, an air quality model used by the US Environmental Protection Agency.

All the benchmarks have been modified slightly to fit into our benchmarking harness.

The NF benchmark uses "nested factorization", a little known but very effective iterative linear solver for huge finite difference matrices. A paper describing nested factorization, and comparing it to other methods is available here.

This Benchmark comparison was produced by Polyhedron Ltd. and this page is reproduced with permission from Polyhedron Ltd.