Linux
Fortran Execution Time Benchmarks
Updated 11/2010
Fortran
Execution Time Benchmarks - 64 bit Linux on IntelCore i7 920
|
Absoft 11.1 |
g95 0.92 |
GFortran 4.3.2 |
Intel 12.0 |
Lahey 8.1 |
NAG 5.2 |
PGI 10.9 |
Sun 8.4 |
|
| AC | 6.22 | 13.94 | 9.60 | 10.21 | 11.01 | 36.65 | 10.44 | 34.71 |
| AERMOD | 17.30 | 37.83 | 30.78 | 13.95 | 15.97 | 26.03 | 16.40 | 16.29 |
| AIR | 2.57 | 9.04 | 6.17 | 2.83 | 4.39 | 6.24 | 5.66 | 4.06 |
| CAPACITA | 27.37 | 41.24 | 33.41 | 28.75 | 32.68 | 38.50 | 31.38 | 35.98 |
| CHANNEL | 2.47 | 11.07 | 1.86 | 1.82 | 2.89 | 2.92 | 2.31 | 1.69 |
| DODUC | 26.04 | 30.39 | 28.04 | 25.27 | 25.45 | 30.40 | 25.03 | 21.87 |
| FATIGUE | 5.02 | 24.81 | 7.57 | 11.54 | 8.03 | 9.89 | 6.22 | 5.81 |
| GAS_DYN | 2.44 | 15.79 | 5.25 | 2.59 | 4.47 | 11.01 | 4.00 | 4.06 |
| INDUCT | 6.54 | 34.81 | 28.55 | 8.69 | 21.94 | 28.81 | 28.15 | 31.92 |
| LINPK | 8.81 | 9.47 | 8.80 | 8.74 | 8.71 | 8.78 | 8.38 | 7.65 |
| MDBX | 10.28 | 12.99 | 11.78 | 10.11 | 11.62 | 13.02 | 12.74 | 11.78 |
| NF | 11.51 | 24.24 | 14.41 | 10.52 | 15.95 | 17.40 | 12.54 | 12.20 |
| PROTEIN | 31.68 | 41.82 | 35.41 | 30.55 | 48.03 | 35.40 | 35.78 | 36.43 |
| RNFLOW | 15.23 | 40.63 | 21.25 | 18.48 | 22.52 | 28.77 | 24.78 | 23.16 |
| TEST_FPU | 6.58 | 15.15 | 7.66 | 5.69 | 8.33 | 8.71 | 6.28 | 7.55 |
| TFFT | 2.23 | 2.48 | 2.29 | 2.23 | 2.16 | 2.41 | 2.23 | 2.17 |
| Geometric Mean | 8.01 | 18.39 | 11.30 | 8.62 | 11.00 | 14.19 | 10.46 | 10.81 |
| Compiler Switches | |
| Absoft |
af95 -m64 -O5 -speed_math=10 -march=core -xINTEGER |
| g95 | g95 -march=nocona -ffast-math -funroll-loops -O3 |
| gfortran | gfortran -march=native -ffast-math -funroll-loops -O3 |
| Intel | ifort -O3 -fast -ipo -no-prec-div |
| Lahey | lf95 --fast -static -x - |
| NAG | nagfor -O4 -Bstatic -ieee=full |
| PGI | pgf95 -Bstatic -V -fastsse -Munroll=n:4 -Mipa=fast,inline |
| Sun | sunf95 -fast -xtarget=nehalem -xipo=2 -m64 -xvector=simd |
Fortran Execution Time Benchmarks - 64-bit Linux on AMD Phenom II
|
Absoft 11.1 |
G95 0.92 |
GFortran 4.4 |
Intel 12.0 |
Lahey 8.1 |
PGI 10.9 |
Sun 8.5 |
|
| AC | 7.36 | 14.71 | 9.47 | 8.75 | 11.51 | 9.90 | 27.95 |
| AERMOD | 18.46 | 37.39 | 30.47 | 15.77 | 18.23 | 17.47 | 15.65 |
| AIR | 3.23 | 11.96 | 6.73 | 3.24 | 4.82 | 7.29 | 4.81 |
| CAPACITA | 32.07 | 55.36 | 45.49 | 37.85 | 42.76 | 34.17 | 51.03 |
| CHANNEL | 3.49 | 14.58 | 3.24 | 4.04 | 4.23 | 4.12 | 3.20 |
| DODUC | 27.64 | 39.98 | 26.32 | 29.13 | 27.07 | 26.72 | 23.17 |
| FATIGUE | 4.77 | 28.13 | 6.56 | 13.16 | 8.06 | 6.77 | 6.17 |
| GAS_DYN | 2.63 | 14.05 | 5.41 | 3.13 | 4.72 | 5.14 | 4.02 |
| INDUCT | 6.19 | 64.02 | 16.14 | 7.28 | 22.05 | 24.63 | 26.69 |
| LINPK | 10.80 | 18.16 | 10.68 | 10.78 | 10.66 | 10.64 | 9.09 |
| MDBX | 10.85 | 26.93 | 11.60 | 10.73 | 12.38 | 13.44 | 11.93 |
| NF | 11.47 | 25.79 | 16.12 | 11.74 | 17.76 | 14.15 | 12.57 |
| PROTEIN | 32.63 | 44.51 | 34.76 | 31.10 | 48.11 | 35.48 | 33.80 |
| RNFLOW | 14.99 | 37.90 | 23.86 | 18.93 | 25.03 | 30.18 | 25.85 |
| TEST_FPU | 9.39 | 24.96 | 7.88 | 8.16 | 8.70 | 8.32 | 8.89 |
| TFFT | 4.00 | 4.19 | 3.93 | 3.86 | 3.77 | 4.16 | 3.85 |
| Geometric Mean | 9.19 | 23.98 | 12.14 | 10.26 | 12.56 | 12.38 | 12.11 |
| Compiler Switches | |
| Absoft |
af90 -m64 -O5 -speed_math=10 -fast_math -march=barcelona -xINTEGER -stack:0x8000000 |
| g95 | g95 -march=opteron -ffast-math -funroll-loops -O3 |
| gfortran | gfortran -march=native -ffast-math -funroll-loops -O3 |
| Intel | ifort -O3 -fast -ipo -no-prec-div |
| Lahey | lf95 --fast -static -x - |
| PGI | pgf90 -Bstatic -V -fastsse -Munroll=n:4 -Mipa=fast,inline |
| Sun | sunf95 -fast -xtarget=native |
| Notes | |
|
All figures are Execution Times in Seconds - All figures are Execution Times in Seconds - measured on a Dell Studio XPS with a Core i7 920 2.66GHz processor, and 9 GBytes 1033MHz DDR3 memory, and running Centos 5.3 Linux and AMD Phenom II X4 955 processor (3.2 GHz), running CentOS 5.5 respectively. Each figure is the average over at least 10 runs (many more for some). Measurement error is typically <1%. Green cells highlight figures within 10% of the fastest. Red cells indicate figures which are more than 150% of the fastest. So far as possible, we have used
the compiler switches which give the best overall results. We
have not attempted to tune individual benchmarks, and, in particular
cases, different switch settings may give better results. For
all except LF95, compiler switches were set to generate 64 bit
executables. Thanks are due to Jos Bergervoet for permission to use his CAPACITA benchmark, to Quetzal Associates for permission to use their CHANNEL, FATIGUE, GAS_DYN, INDUCT, PROTEIN and RNFLOW benchmarks, to David Frank for his TEST_FPU benchmark, and to Ted Addison of McVehil-Monnett Associates for permission to use AERMOD, an air quality model used by the US Environmental Protection Agency. All the benchmarks have been modified slightly to fit into our benchmarking harness. The NF benchmark uses "nested factorization", a little known but very effective iterative linear solver for huge finite difference matrices. A paper describing nested factorization, and comparing it to other methods is available here. This Benchmark comparison was produced by Polyhedron Ltd. and this page is reproduced with permission from Polyhedron Ltd.
|