Single Precision Matrix Performance

mixed cg:

In the branch mixed_cg I have added a mixed single/double precision solver for the light doublet (no clover term so far, but to come...). The solver can be invoked by specifying

Solver = mixedcg

in the operator section and by setting:

UseSloppyPrecision = yes

I have seen speedups of the solver of about 30% compared to cg both on scalar and MPI setups. The single precision matrix is only improved for BG/Q by using intrinsics so far. I will be working on an AVX version in the next time.

BG/Q:

The optimized BG/Q version that includes the overlapping of computation and communication from the InterleavedNDTwistedClover branch is available in my branch interleaved_mixed_cg. It uses OMP orphaning for the 32 bit Matrix. The speedup depends on local lattice size as can be seen from this figure: 32-vs-64 bit benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Precision Matrix Performance

mixed cg:

BG/Q:

Clone this wiki locally