|
Auto-Vectorization
on x86 & x64 Processors:
Absoft compilers include
auto-vectorization capabilities which
utilize the SIMD instructions of the host processor to restructure code
in a manner which allows executing multiple operations simultaneously.
This action is performed automatically and requires no action by the operator
other than invoking the auto-vector option.
Auto-vectorization is especially effective on loops and in some cases can result in significant speed increases.
Absoft
compilers can also generate an
auto-vectorization report, showing which code segments were vectorized,
and which were not, and why. This allows the programmer to review, and
at their option, modify the existing code for additional performance gains.
Absoft
auto-vectorization support is included
in Absoft compilers v10.0 and higher for AMD and Intel x86 and x64 processors.
Vector Examples
on x86_64:
Performance Graph
Vectorization Report Example
The Vectorization Report shows
which loops were vectorized, which were not - and why.
View
Sample Report
Auto-Vectorization
on POWER &
G4/G5 Processors:
IBM POWER and Apple
G4/G5 processors include hardware
vector units which can accelerate application performance. Absoft has
partnered with Crescent Bay Software to offer optional VAST vectorization
tools tuned for the POWER architecture which make the vectorization process
automatic and retains the original source code.
VAST-F/Vector is a preprocessor which
examines source code looking for loops or other code segments which can benefit from vectorization. It then automatically generates new source code which includes vector calls. The original source code is also maintained. The new source code is then compiled in the normal manner with Absoft Fortran.
Vector Examples
on POWER and G4/G5:
Performance Graph
The graph below illustrates the performance
benefits of using VAST Vectorization
Tools on a Mac G4 system.
For this test, longer is better:
VAST-F/Vector Information
VAST Vector:
-
Optimization
of entire loop nests, not just inner loops. Critical optimizations
include loop fusion (squeezing multiple loops into one loop),
outer loop unrolling (unrolling an outer loop inside an inner
loop), loop collapse (making one long loop from a multiple dimension
loop), and loop interchange (changing the order of the loops
in a loop nest to get more efficient memory access).
-
Unrolled
vector loops. Unrolling vectorized loops is very important in
making sure that the vector instructions are overlapped the
the maximum extent possible.
-
Vectorization
of reduction loops. Includes array summations, dot products,
minimum and maximum element of an array, product of array elements,
etc. These operations take a large fraction of the CPU time
for many programs.
-
Vectorization
of conditional loops. "if" statements and conditional
operators are vectorized.
-
Non-aligned
vectors can be vectorized efficiently. VAST introduces "permute"
operations to align vectors "on the fly" prior to
computation.
-
32-bit
float and 8, 16 and 32-bit integer vectorization. Integers can
be signed and unsigned. Also, VAST can vectorize loops that
contain mixed data sizes.
-
ALIGNED
pragma so that the user can inform VAST-C about arrays that
are aligned on 16-byte boundaries. Also the -Valigned command
line switch.
-
-Vmessages
switch to get vectorization messages for all loops in the program.
Find out what constructs are inhibiting vectorization of your
important loops.
-
DISJOINT,
NODEPCHK pragmas for disambiguating data dependencies. Especially
useful if the target program uses lots of pointers rather than
array notation.
-
-L
parameter for assertion levels to allow vectorization in the
presence of pointer arguments. Can be very useful if the program
is written to pass most of the data as pointer arguments.
-
Vector
load lifting. Move all loads to the top of the loop, as far
as they will go (safely). Allows the compiler to do a better
job of instruction scheduling.
-
Vectorization
of complex data type. Uses the permute instructions to reorder
interleaved complex data so that it can be operated on with
the vector unit.
-
Testing
for stride one on loops with variable stride. Inserts a run-time
test to see if variable array strides are all one; executes
a vector version of the loop if the strides are one, otherwise
executes the original scalar loop.
-
Partial
vectorization of loops with strided or gather/scatter vectors.
-
Vectorization
of "table lookup" loops. Loops that have a branch
out of the loop can be vectorized in certain cases.
|

All Absoft Compilers Include FREE Technical Support!
Experienced Support Engineers are available via phone at
248-853-0095 or email
9am to 4pm EST (M-F)
to answer your Absoft Fortran questions!

|