- The benchmarks provide code that exercises microcontroller performance.
- Various efficiency aspects are emphasized such as integral and floating-point calculations, looping, branching, etc.
- Each benchmark is implemented as a single callable function to be called from a scheduled task in the multitasking scheduler configuration.
- Every benchmark file can also be compiled separately as a standalone C++14, 17, 20, 23 and beyond project.
- A benchmark digital I/O pin is toggled hi/lo at begin/end of the benchmark run providing for oscilloscope real-time measurement.
- The benchmarks provide scalable, portable means for identifying the performance class of the microcontroller.
Executing the benchmarks is straightforward. Select the desired benchmark and
activate its corresponding flag in
app_benchmark.h.
In particular, #define
the flag APP_BENCHMARK_TYPE
to be one of the pre-defined benchmark types.
This is typically done by simply un-commenting one of the easily-found relevant lines around
line 34 here.
Compile the reference application and run on the target.
The benchmark timing will be reflected on microcontroller's corresponding
benchmark port pin (the definition of which can be found in its target-specific MCAL).
Individual benchmarks can also be run standalone on any C++ platform.
In the following short link
to godbolt, for instance, we have adapted the
APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL
benchmark for standalone use.
The main()
subroutine in the benchmark source files is activated
with the compiler definition APP_BENCHMARK_STANDALONE_MAIN
.
-
via
#define APP_BENCHMARK_TYPE_NONE
is an empty benchmark with merely a Boolean function call returningtrue
. -
via
#define APP_BENCHMARK_TYPE_COMPLEX
computes a floating-point complex-valued trigonometric sine function using theextended_complex::complex
template class. -
via
#define APP_BENCHMARK_TYPE_CRC
calculates a$32$ -bit, byte-oriented CRC result described in Sect. 6.1 of the book. -
via
#define APP_BENCHMARK_TYPE_FAST_MATH
calculates reduced, time-optimized floating-point elementary transcendental functions. -
via
#define APP_BENCHMARK_TYPE_FILTER
calculates an integral FIR filter sampling result. -
via
#define APP_BENCHMARK_TYPE_FIXED_POINT
calculates the first derivative of an elementary function using the self-writtenfixed_point
template class in Chap. 13 of the book. -
via
#define APP_BENCHMARK_TYPE_FLOAT
implements the floating-point examples detailed in Sect. 12.4 of the book. -
via
#define APP_BENCHMARK_TYPE_WIDE_INTEGER
performs$256$ -bit unsigned big integer calculations using theuintwide_t
class. -
via
#define APP_BENCHMARK_TYPE_PI_SPIGOT
performs a pi calculation using a template-based spigot algorithm with calculation steps divided among the slices of the idle task. -
via
#define APP_BENCHMARK_TYPE_PI_SPIGOT_SINGLE
does the same pi calculation as above implemented as a single function call. -
via
#define APP_BENCHMARK_TYPE_HASH
computes a$160$ -bit hash checksum of a$3$ -byte character-based message. -
via
#define APP_BENCHMARK_TYPE_WIDE_DECIMAL
computes a$100$ decimal digit square root using thedecwide_t
template class. -
via
#define APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL
computes the numerical floating-point result of a Bessel function using a recursive trapezoid integration routine. -
via
#define APP_BENCHMARK_TYPE_PI_AGM
computes$53$ decimal digits of pi (or optionally$101$ decimal digits of pi) using a Gauss AGM method with thedecwide_t
template class having a so-called limb type ofstd::uint16_t
. -
via
#define APP_BENCHMARK_TYPE_BOOST_MATH_CBRT_TGAMMA
uses Boost.Math to compute the cube root of various Gamma functions values. -
via
#define APP_BENCHMARK_TYPE_BOOST_MATH_CYL_BESSEL_J
also uses Boost.Math to calculate cylindrical Bessel functions of small, non-integer order. -
via
#define APP_BENCHMARK_TYPE_CNL_SCALED_INTEGER
brings a small subset of the CNL Library onto the metal by exercising various elementary quadratic calculations with the fixed-point representations ofcnl::scaled_integer
. -
via
#define APP_BENCHMARK_TYPE_SOFT_DOUBLE_H2F1
calculates an${\approx}~{15}$ decimal digit hypergeometric function value using a classic iterative rational approximation scheme. This calculation is also included as an example in the soft_double project. -
via
#define APP_BENCHMARK_TYPE_BOOST_MULTIPRECISION_CBRT
uses Boost.Multiprecision in combination with Boost.Math to compute$101$ decimal digits of a cube root function. -
via
#define APP_BENCHMARK_TYPE_HASH_SHA256
computes a$256$ -bit hash checksum of a short$3$ -byte character-based message. -
via
#define APP_BENCHMARK_TYPE_ECC_GENERIC_ECC
provides an intuitive view on elliptic-curve algebra, depicting a well-known$256$ -bit cryptographic key-gen/sign/verify method. This benchmark is actually too lengthy to run on most of our embedded targets (other than BBB or RPI-zero) and adaptions of OS/watchdog are required in order to run this benchmark on the metal. -
via
#define APP_BENCHMARK_TYPE_NON_STD_DECIMAL
carries out a$64$ -bit decimal-floating-point calculation of the exponential function using the contemporary cpplliance/decimal library. This benchmark does not, at the moment, run on the AVR target, but requires a larger microcontroller such as one of the$32$ -bit ARM(R) devices.
Most of the benchmarks run on each supported target system. Experience with runs on the individual target systems reveal a wide range of microcontroller performance classes.
Consider, for instance,
app_benchmark_pi_agm.cpp
which exercises the benchmark of type APP_BENCHMARK_TYPE_PI_AGM
.
This benchmark computes decwide_t
template class.
A typical range of performance classes is shown in the following table.
The benchmark used is a
Target | runtime |
relative |
---|---|---|
am335x |
1.5 | 1.0 |
stm32f446 |
5.1 | 3.4 |
rpi_pico2_rp2350 |
6.3 | 4.2 |
wch_ch32v307 |
8.0 | 5.3 |
rpi_pico_rp2040 |
19 | 13 |
avr |
420 | 280 |
There are strikingly differing performance classes
for the
The stm32f446
board performs the calculation in
the middle of the two extremes, with a result
of
The wch_ch32v307
board boasts a quite respectable
time of
Using only one core (core 1) on the rpi_pico_rp2040
board results in a calculation
time of rpi_pico2_rp2350
with dual ARM(R) Cortex(R) M33 cores definitively improves on this
(still using only core 1) with a time of