In gemmlowp, the "output pipeline" is the process that takes a final int32
accumulator value (the output of the compute/kernel stage), and processes it to
obtain the final value (typically a uint8
value) and write it to the
destination matrix.
Gemmlowp has some genericity in what arithmetic transformations take place in the output pipeline, so as to allow different users to implement different quantization paradigms. See low-precision.md and quantization.md.
Besides implementing a quantization paradigm, the other thing that output pipelines is good for, is implementing fused operations where a matrix multiplication feeds into other operations applied to its result, without additional array traversals. For instance, when implementing neural network inference, one might have a Convolutional layer with a bias-addition and an activation. One then wants to feed the result of the matrix multiplication implementing the Convolutional operator itself, directly into the bias-addition and activation function. gemmlowp's output pipelines allow implementing that: the bias-addition and activation function are just additional stages in the output pipeline.
The gemmlowp entry point allowing to use an arbitrary output pipeline is
GemmWithOutputPipeline
in public/gemmlowp.h.
The output pipeline is specified as a std::tuple
of "output stages", each of
which defining an elementary arithmetic transformation.
All available output stages are defined in public/output_stages.h.
The best part to see examples of using various output pipelines is in the unit test,
test/test.cc
specifically in this function:
TestOutputStages
Separately, a self-contained example showing how to use gemmlowp to compute a quantized matrix multiplication with a sounds quantization paradigm, is here: