Skip to content

Single Precision Matrix Performance

Florian edited this page Nov 5, 2013 · 4 revisions

mixed cg:

In the branch mixed_cg I have added a mixed single/double precision solver for the light doublet (no clover term so far, but to come...). The solver can be invoked by specifying

Solver = mixedcg

in the operator section and by setting:

UseSloppyPrecision = yes

I have seen speedups of the solver of about 30% compared to cg both on scalar and MPI setups. The single precision matrix is only improved for BG/Q by using intrinsics so far. I will be working on an AVX version in the next time.

BG/Q:

The optimized BG/Q version that includes the overlapping of computation and communication from the InterleavedNDTwistedClover branch is available in my branch interleaved_mixed_cg. It uses OMP orphaning for the 32 bit Matrix. The speedup depends on local lattice size as can be seen from this figure: 32-vs-64 bit benchmark