Skip to content

Security Best Practices

Anas Nashif edited this page Jul 8, 2018 · 1 revision

This is a draft for review by the Zephyr Project Security Working Group, before being published as part of the Zephyr project documentation set. It has been written more than a year ago, so it needs to be updated to reflect the current reality. It should also be expanded to explain Kconfig options and how they interact with our security hardening features.

Best practices for Secure Zephyr Applications and Deployment

The Zephyr OS is a small, scalable, real-time operating system that supports multiple architectures and resource-constrained systems. Because of these constraints, secure development using Zephyr OS can be different from secure development on other operating systems.

Security is a shared responsibility. While the Zephyr kernel core protects against many threats, product developers must ensure they address all other threats that are applicable to their system. The Zephyr OS security boundary separates the kernel from the “outside world". This boundary is aimed at preventing external entities from exploiting the kernel and helps prevent attacks by the host via host-kernel communication channels (e.g. serial port communication).

As part of providing clear and comprehensive documentation on related security issues and suggested configurations, this document outlines some key things developers using Zephyr OS should know. All of these items can have a direct impact on the security objectives of a product using the platform; as such, they must be carefully evaluated.

Developers can (and often will) disable features not in use

Because Zephyr OS devices are very resource-constrained, developers typically disable unused components. Besides reducing the memory footprint, excluding unused features has a positive effect on security by reducing the number of potential attack surfaces.

Build-time configurable options give developers an easy way to disable features that are not used as well as enabling desired capabilities and security-related features. The complete list of configuration options supported by the build system is documented in: https://www.zephyrproject.org/doc/reference/kconfig/index.html.

Threads keep separate contexts

Developers can assume threads maintain separate context, and upon a context switch, there is nothing left from the previous thread context in the CPU.

Threads using Floating Point (FP) operations are marked as such, so the system will context switch FP/SSE/MMX status as needed.

Interrupt Service Routines (ISRs) are not FP capable. Any use of FP/SSE/MMX by an ISR requires the application itself to save and restore context. This is not a recommended practice.

System and application separation

Although applications are compiled together with the kernel into a single binary, there's an optional memory protection feature that will provide separation between the kernel and applications. The feature is optional as it leverages hardware features (such as Memory Protection Units or Memory Management Units), and incurs in some overhead that smaller systems might not be able to afford.

With this feature, all interaction with device drivers and some kernel objects (e.g. synchronization primitives) will go through a system call layer, that will not only validate all parameters, but also go through an access control list to determine if the thread that invoked that particular system call has been given permission. System calls also validate types (e.g. it's not possible to call semaphore syscalls with pipe objects) and initialization status.

On systems without the necessary hardware features, there is no “principle of least privilege” possible in Zephyr OS. As such, application developers must adhere to the highest standards of secure software development, as any mistake will result in full system access for the attacker. Protecting against this would cause unacceptable memory and performance impacts. It is also impossible to protect the kernel against attackers who have direct access to the hardware resources it utilizes.

Assumed Trust

There is no prevention of attacks from trusted parties including Zephyr Project team members, OEMs, application developers, and third parties.

This document does not access security issues associated with the following adversaries:

  • Accidental or malicious behavior of the kernel core. We assume these components can be trusted. It is also impractical to protect the kernel against attackers having supervisor-level access without unacceptable memory and performance impacts.
  • Accidental or malicious misbehavior of the OEM supplied portions of the product. This includes anything that has access to physical memory regions, interrupt controller, etc.
  • Accidental or malicious misbehavior of the system builder. It is assumed that the developer building the system containing Zephyr OS is a trusted developer who configures security features correctly.

Communication protocols have been stripped of some security and privacy features

Deprecated:

The Zephyr OS provides the architecture necessary for secure cryptography and secure communication, and ensuring that necessary keys and device IDs are not exposed. Some of the protocols included in the Zephyr OS would normally have the ability to use cryptography to enhance their security and privacy. However, there is no RawPublicKey for Constrained Application Protocol (CoAP) messages, and no signing is supported in routing protocol for Low-Power and Lossy Networks (RPL). Developers needing such features to ensure their IoT devices maintain secure communications will need to implement these features themselves.

Attackers may have full access to devices

Because the Zephyr OS is designed for small Internet of Things devices, it may be easy for an attacker to gain physical access to a device. Even proximity access can provide close-range radio frequency or magnetic interference attack opportunities. This has many implications for security that are difficult to mitigate: it is probably not safe to assume that other devices running Zephyr OS have not been tampered with, and it is probably not safe to assume that the network is only populated by fully trusted devices. Developers must take note of these issues in their own threat models and take actions to mitigate them when possible. Zephyr OS does not provide protections for physical devices, as this is out of scope.

Denial of Service protections

Zephyr project does not provide protections against Denial of Service (DoS) attacks within the kernel due to negative impacts on performance and memory footprint size. If such protections are desired, they must be included as part of a larger system, for example, in a gateway or firewall.

Protection against compromised I/O

Zephyr project does not provide protections against compromised I/O interfaces with valid commands. No error checking or handling is provided by the kernel due to negative performance and memory footprint constraints. If protection is required, it must be provided by the application using the input. For example, this could include recovering from malformed data such as Bluetooth Low Energy GATT commands and USB interfaces. Zephyr OS does endeavor to provide robust drivers for all I/O interfaces (including Bluetooth Low Energy, USB, SPI, UART, and Wi-Fi). This is robustness for basic availability so drivers do not crash. Malicious input is not detected, sanitized, or handled. Comprehensive input validation is an application layer responsibility.

Integrity of the Zephyr Image

The developer must ensure the integrity of the Zephyr image that is loaded into memory prior to its activation. The Zephyr OS does not contain any internal mechanisms (such as a manifest list) to detect the presence of unauthorized files or routines that have been incorporated into the image, nor does it have any internal mechanism (such as a signature validation scheme) to detect tampering with authorized files.

If such protections are desired, these protections must be included as part of the hardware platform, for example, as part of secure boot, trusted execution, or checks in the application layer.

The product development group must protect the kernel image in memory (following wakeup) from being corrupted by external intruders. The Zephyr OS does not have a way to determine if its memory regions have been modified by externally driven changes so harden any drivers and kernel applications that touch memory, and use XIP or ROM for code storage.

Integrity of Zephyr OS Memory Regions

Deprecated:

The developer must prevent outside interference with the integrity of the Zephyr image in memory after its activation. The Zephyr OS does not contain any internal mechanisms that will detect externally-driven changes to its memory regions. In a system that utilizes demand paging, both primary memory regions and backing store memory regions must be free from such interference.

Hardware verification

The developer must ensure the integrity of hardware used to run Zephyr OS and applications. The Zephyr OS does not provide platform validation for memory, CPU, and other hardware components.

Stack Memory Allocation

The developer must ensure all kernel tasks and threads are allocated sufficient stack memory.

Deprecated:

Some memory protections may be available but are not enabled by default due to performance and memory footprint impact. For example, stack canaries can be enabled for debugging purposes but are typically not used in production environments.

Entropy and Pseudo Random Number Generation

Deprecated:

To ensure the security of any cryptography used on Zephyr OS (for example, for secure communications), products ensure Zephyr OS pseudo random number generators are seeded with sufficient entropy. The ideal sources will depend heavily on the hardware and software sources of entropy available within a product. The Zephyr OS generic setup uses only time to ensure compatibility across devices, but time is not considered to be sufficient for good cryptographic security because it can allow an attacker a foothold into the system. This generic initialization (based on time) should never be used in a final product. All systems in the Zephyr project use the default CS-PRNG (Cryptographically Secure PRNG) and the project requires all added BSPs to supply a good seed. Random API efficiency is critical, since the API is called frequently.

Features such as "stack canaries" and "stack pointer randomization" require a random number generator to be selected. Zephyr provides a distinction between a hardware device capable of generating those numbers (entropy drivers) and software-based random number generators. Random number generation has to be set up with two configuration options:

  • One option enables an entropy driver for the particular platform. There's currently no other way of obtaining entropy in Zephyr devices without a hardware dedicated for this. Not all platforms support this feature, though: ESP32, MCUX, STM32, and NRF5 devices do. Writing a driver for such device is trivial in the sense that the interface is comprised only of one function.
  • The other option enables a random number generator. Two of them should be ignored in production systems and are only provided for testing purposes inside emulators:
    • Never enable the following random number generators in a production system: CONFIG_X86_TSC_RANDOM_GENERATOR CONFIG_TIMER_RANDOM_GENERATOR
    • Prefer to use either:
      • CONFIG_ENTROPY_DEVICE_RANDOM_GENERATOR, which is a passthrough implementation that queries the entropy driver directly
      • CONFIG_XOROSHIRO_RANDOM_GENERATOR, which enables xoroshiro128, which uses the entropy driver only for its initial seed.

CPU Time Allocation

The developer must ensure tasks receive sufficient CPU time. The Zephyr OS task scheduler does not prevent tasks from experiencing CPU starvation. The task scheduler does time-slicing when multiple tasks of equal priority are ready to run, but this does not prevent a higher-priority task from starving lower-priority tasks.

ISRs and Floating Point Registers

The developer must ensure any ISRs they create do not use floating point registers, since the kernel core does not save and restore floating point context information when an ISR executes. Use of floating point in IRSs may cause the kernel to malfunction.

Do not store secrets in volatile floating point registers

Do not place secrets within volatile floating point registers. The Zephyr OS does not zero out volatile floating point registers when a non-preemptive context switch occurs.

Build Options

These are the build-time options that will directly affect security features provided by Zephyr. Each of them will have their own, complete, documentation, that explains in detail how they should be used, and are listed here for convenience.

  • If supported by the platform, CONFIG_USERSPACE must be enabled. This enables memory protection features that is assisted by the hardware, and makes it harder for user threads to corrupt the kernel or access devices that they were not given explicit permission to. This is documented in great detail in the documentation, which includes information about how memory is partitioned, permission is given, and a lot of other important details: http://docs.zephyrproject.org/kernel/usermode/usermode.html

    • When enabling userpsace, it's recommended to define CONFIG_EXECUTE_XOR_WRITE. This will ensure that executable pages are not also writable. There's a performance penalty when setting these memory partitions, so this can be disabled in release builds with hardcoded memory domains.
    • It's recommended to split read-write memory sections between kernel and user areas by enabling the CONFIG_APPLICATION_MEMORY option.
  • On x86 platforms, the following recommendations should be followed:

    • It's recommended to enable CONFIG_X86_PAE_MODE, so that the XD (eXecute Disable) bit is used to disable execution of pages that does not contain program text.
    • On targets with speculative execution, CONFIG_RETPOLINE should be enabled to avoid branch target injection (aka Spectre V2).
    • The option CONFIG_GDT_DYNAMIC should be left at its default setting (disabled), to ensure that the GDT cannot be modified.
  • For stack protection, there are two options that can be enabled:

    • It is recommended to build Zephyr and the application with CONFIG_STACK_CANARIES enabled. This requires a random number generator; more details below.
    • CONFIG_HW_STACK_PROTECTION, as the name implies, requires supported hardware. This will catch some kinds of stack overflows while the CPU is in supervisor mode, and will panic the system should that happen. If CONFIG_USERPACE is not enabled, the CPU is always executing in supervisor mode. If this option is enabled, it's a good idea to enable a watchdog timer to reboot the system and restart it from a good known state.
  • It's strongly recommended to set CONFIG_STACK_POINTER_RANDOM. This will randomize the initial stack pointer on thread creation, wasting up to the configured amount of bytes from the stack in an attempt to provide unpredictability where objects end up in memory. This also requires a random number generator.

  • Even though the kernel does not log a lot, it's a good practice to compile out debugging messages in release build by disabling CONFIG_THREAD_MONITOR, CONFIG_THREAD_STACK_INFO, CONFIG_SYS_LOG, CONFIG_KERNEL_DEBUG, and CONFIG_DEBUG.

  • It's recommended to keep CONFIG_CONSOLE_SHELL disabled, as it provides information such as version and uptime information. Similarly, CONFIG_BOOT_BANNER may be disabled to hide the same kind of information.

  • The option CONFIG_INIT_STACKS is not meant to be used as a security feature, but it will initialize thread stacks with a known value (0xaa) before they're first used. This helps in cases where threads are created and destroyed after system initialization, where there might be reuse of thread stack spaces.

Configuring mbedTLS

mbedTLS, the library Zephyr distributes to provide SSL, TLS, and DTLS, alongside other related protocols and formats, is highly configurable at build time. The default configuration file follows recommendations set forth in RFC 7525, which documents the best known methods for TLS and DTLS deployments. However, the product development group should go through the configuration file, and enable only the features being used in the product; this will not only reduce ROM footprint, but also reduce the attack surface. The mbedTLS project provides a number of sample configuration files that can be used as starting point.

In addition to configuring mbedTLS at build time, it's important to notice that its random number generator, while compliant with FIPS specs and thus presumably safe, requires a good source of entropy. This can be provided both if mbedTLS is used directly (by calling mbedtls_entropy_add_source()), or by providing a callback function to the net_app library provided by Zephyr.

Clone this wiki locally