Name		Name	Last commit message	Last commit date
parent directory ..
app_benchmark.cpp		app_benchmark.cpp
app_benchmark.h		app_benchmark.h
app_benchmark_boost_crypt_hasher.cpp		app_benchmark_boost_crypt_hasher.cpp
app_benchmark_boost_math_cbrt_tgamma.cpp		app_benchmark_boost_math_cbrt_tgamma.cpp
app_benchmark_boost_math_cyl_bessel_j.cpp		app_benchmark_boost_math_cyl_bessel_j.cpp
app_benchmark_boost_multiprecision_cbrt.cpp		app_benchmark_boost_multiprecision_cbrt.cpp
app_benchmark_cnl_scaled_integer.cpp		app_benchmark_cnl_scaled_integer.cpp
app_benchmark_complex.cpp		app_benchmark_complex.cpp
app_benchmark_crc.cpp		app_benchmark_crc.cpp
app_benchmark_detail.h		app_benchmark_detail.h
app_benchmark_ecc_generic_ecc.cpp		app_benchmark_ecc_generic_ecc.cpp
app_benchmark_fast_math.cpp		app_benchmark_fast_math.cpp
app_benchmark_filter.cpp		app_benchmark_filter.cpp
app_benchmark_fixed_point.cpp		app_benchmark_fixed_point.cpp
app_benchmark_float.cpp		app_benchmark_float.cpp
app_benchmark_hash.cpp		app_benchmark_hash.cpp
app_benchmark_hash_sha256.cpp		app_benchmark_hash_sha256.cpp
app_benchmark_non_std_decimal.cpp		app_benchmark_non_std_decimal.cpp
app_benchmark_none.cpp		app_benchmark_none.cpp
app_benchmark_pi_agm.cpp		app_benchmark_pi_agm.cpp
app_benchmark_pi_spigot.cpp		app_benchmark_pi_spigot.cpp
app_benchmark_pi_spigot_single.cpp		app_benchmark_pi_spigot_single.cpp
app_benchmark_soft_double_h2f1.cpp		app_benchmark_soft_double_h2f1.cpp
app_benchmark_trapezoid_integral.cpp		app_benchmark_trapezoid_integral.cpp
app_benchmark_wide_decimal.cpp		app_benchmark_wide_decimal.cpp
app_benchmark_wide_integer.cpp		app_benchmark_wide_integer.cpp
readme.md		readme.md

readme.md

Real-Time-C++ - Benchmarks

Implementation details

The benchmarks provide code that exercises microcontroller performance.
Various efficiency aspects are emphasized such as integral and floating-point calculations, looping, branching, etc.
Each benchmark is implemented as a single callable function to be called from a scheduled task in the multitasking scheduler configuration.
Every benchmark file can also be compiled separately as a standalone C++14, 17, 20, 23 and beyond project.
A benchmark digital I/O pin is toggled hi/lo at begin/end of the benchmark run providing for oscilloscope real-time measurement.
The benchmarks provide scalable, portable means for identifying the performance class of the microcontroller.

Executing the benchmarks

Executing the benchmarks is straightforward. Select the desired benchmark and activate its corresponding flag in app_benchmark.h. In particular, #define the flag APP_BENCHMARK_TYPE to be one of the pre-defined benchmark types. This is typically done by simply un-commenting one of the easily-found relevant lines around line 34 here. Compile the reference application and run on the target. The benchmark timing will be reflected on microcontroller's corresponding benchmark port pin (the definition of which can be found in its target-specific MCAL).

Individual benchmarks can also be run standalone on any C++ platform. In the following short link to godbolt, for instance, we have adapted the APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL benchmark for standalone use. The main() subroutine in the benchmark source files is activated with the compiler definition APP_BENCHMARK_STANDALONE_MAIN.

Individual benchmarks

via #define APP_BENCHMARK_TYPE_NONE is an empty benchmark with merely a Boolean function call returning true.
via #define APP_BENCHMARK_TYPE_COMPLEX computes a floating-point complex-valued trigonometric sine function using the extended_complex::complex template class.
via #define APP_BENCHMARK_TYPE_CRC calculates a $32$-bit, byte-oriented CRC result described in Sect. 6.1 of the book.
via #define APP_BENCHMARK_TYPE_FAST_MATH calculates reduced, time-optimized floating-point elementary transcendental functions.
via #define APP_BENCHMARK_TYPE_FILTER calculates an integral FIR filter sampling result.
via #define APP_BENCHMARK_TYPE_FIXED_POINT calculates the first derivative of an elementary function using the self-written fixed_point template class in Chap. 13 of the book.
via #define APP_BENCHMARK_TYPE_FLOAT implements the floating-point examples detailed in Sect. 12.4 of the book.
via #define APP_BENCHMARK_TYPE_WIDE_INTEGER performs $256$-bit unsigned big integer calculations using the uintwide_t class.
via #define APP_BENCHMARK_TYPE_PI_SPIGOT performs a pi calculation using a template-based spigot algorithm with calculation steps divided among the slices of the idle task.
via #define APP_BENCHMARK_TYPE_PI_SPIGOT_SINGLE does the same pi calculation as above implemented as a single function call.
via #define APP_BENCHMARK_TYPE_HASH computes a $160$-bit hash checksum of a $3$-byte character-based message.
via #define APP_BENCHMARK_TYPE_WIDE_DECIMAL computes a $100$ decimal digit square root using the decwide_t template class.
via #define APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL computes the numerical floating-point result of a Bessel function using a recursive trapezoid integration routine.
via #define APP_BENCHMARK_TYPE_PI_AGM computes $53$ decimal digits of pi (or optionally $101$ decimal digits of pi) using a Gauss AGM method with the decwide_t template class having a so-called limb type of std::uint16_t.
via #define APP_BENCHMARK_TYPE_BOOST_MATH_CBRT_TGAMMA uses Boost.Math to compute the cube root of various Gamma functions values.
via #define APP_BENCHMARK_TYPE_BOOST_MATH_CYL_BESSEL_J also uses Boost.Math to calculate cylindrical Bessel functions of small, non-integer order.
via #define APP_BENCHMARK_TYPE_CNL_SCALED_INTEGER brings a small subset of the CNL Library onto the metal by exercising various elementary quadratic calculations with the fixed-point representations of cnl::scaled_integer.
via #define APP_BENCHMARK_TYPE_SOFT_DOUBLE_H2F1 calculates an ${\approx}~{15}$ decimal digit hypergeometric function value using a classic iterative rational approximation scheme. This calculation is also included as an example in the soft_double project.
via #define APP_BENCHMARK_TYPE_BOOST_MULTIPRECISION_CBRT uses Boost.Multiprecision in combination with Boost.Math to compute $101$ decimal digits of a cube root function.
via #define APP_BENCHMARK_TYPE_HASH_SHA256 computes a $256$-bit hash checksum of a short $3$-byte character-based message.
via #define APP_BENCHMARK_TYPE_ECC_GENERIC_ECC provides an intuitive view on elliptic-curve algebra, depicting a well-known $256$-bit cryptographic key-gen/sign/verify method. This benchmark is actually too lengthy to run on most of our embedded targets (other than BBB or RPI-zero) and adaptions of OS/watchdog are required in order to run this benchmark on the metal.
via #define APP_BENCHMARK_TYPE_NON_STD_DECIMAL carries out a $64$-bit decimal-floating-point calculation of the exponential function using the contemporary cpplliance/decimal library. This benchmark does not, at the moment, run on the AVR target, but requires a larger microcontroller such as one of the $32$-bit ARM(R) devices.

Performance classes

Most of the benchmarks run on each supported target system. Experience with runs on the individual target systems reveal a wide range of microcontroller performance classes.

Consider, for instance, app_benchmark_pi_agm.cpp which exercises the benchmark of type APP_BENCHMARK_TYPE_PI_AGM. This benchmark computes ${\sim}50{\ldots}100$ decimal digits of the mathematical constant $\pi$ using a Gauss AGM method with help from the decwide_t template class.

A typical range of performance classes is shown in the following table. The benchmark used is a ${\sim}100$ decimal digit AGM $\pi$ calculation.

Target	runtime $[ms]$	relative
`am335x`	1.5	1.0
`stm32f446`	5.1	3.4
`rpi_pico2_rp2350`	6.3	4.2
`wch_ch32v307`	8.0	5.3
`rpi_pico_rp2040`	19	13
`avr`	420	280

There are strikingly differing performance classes for the $8$-bit MICROCHIP(R) AVR controller of the ARDUINO and the $32$-bit ARM(R) 8 controller of the BeagleBone Black Edition, Rev. C. The $\pi$ calculation requires approximately $420~\text{ms}$ and $1.5~\text{ms}$, respectively, on these two microcontroller systems.

The $32$-bit ARM(R) Cortex(R) M4F controller on the stm32f446 board performs the calculation in the middle of the two extremes, with a result of $5.1~\text{ms}$.

The $32$-bit RISC-V controller (having a novel open-source core) on the wch_ch32v307 board boasts a quite respectable time of $8.0~\text{ms}$.

Using only one core (core 1) on the $32$-bit ARM(R) Cortex(R) M0+ controller of the rpi_pico_rp2040 board results in a calculation time of $19~\text{ms}$. The next generation rpi_pico2_rp2350 with dual ARM(R) Cortex(R) M33 cores definitively improves on this (still using only core 1) with a time of $6.3~\text{ms}$. This is slightly more than $3~\text{ms}$ times faster than its predecessor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

readme.md

Real-Time-C++ - Benchmarks

Implementation details

Executing the benchmarks

Individual benchmarks

Performance classes

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

readme.md

Real-Time-C++ - Benchmarks

Implementation details

Executing the benchmarks

Individual benchmarks

Performance classes