The first picture shows a comparison of the speedup it gives to
change the minimum recursion level. In the most naive (and beautiful)
version the
algorithm recurs down to 1. However, this produces many function calls,
so it is more efficient to write special code to handle all the small
cases, and there by avoid the many function calls. As the problem gets
large, the computational cost is completely dominated by the work done
on the large blocks, so the speedup from writing the small cases
explicitly is only visible for small problem sizes.
The following picture shows the recursive routine compared to the
standard algorithm supplied by LAPACK, and the LAPACK algorithm using
the ATLAS block size supplied by the routine ilaenv.f. The ATLAS BLAS
is used in all cases.
The new recursive algorithm is about 10% faster than the LAPACK
algorithm. What is a little disturbing however is the fact that the
LAPACK algorithm actually runs slower when using the ATLAS block size.
For large problem sizes the situation looks like this. The recursive
algorithm is still in the lead.
The next picture shows the large problem size test performed with LU
on the same architecture using all the same libraries, so see if the
LAPACK/ilaenv
combination is still slower than LAPACK without knowledge of ATLAS
block size.
In this case it does give better performance to supply LAPACK with ilaenv.