Skip to content

Latest commit

 

History

History
628 lines (540 loc) · 29.1 KB

chapter6.adoc

File metadata and controls

628 lines (540 loc) · 29.1 KB

I/O MPT extension

Supervisor domains may be granted control over DMA-capable devices. When such direct device association is supported, the system might also incorporate multiple instances of IOMMU. Each IOMMU instance can be tied directly to a supervisor domain, allowing that domain to manage address translation and protection for DMA that originates from devices under its control.

To uphold isolation properties, the DMA from the devices and the IOMMU linked with a supervisor domain must adhere strictly to the access protections encoded in the MPT of the respective supervisor domain. Additionally, using the MPT, the RDSM enforces that the IOMMU memory-mapped programming regions are access-restricted to the supervisor domain the IOMMU is assigned to.

At any given time, a solitary supervisor domain is scheduled for execution on a RISC-V hart by the root domain security manager (RDSM). As part of this scheduling, the RDSM programs a pointer to the MPT into a CSR within the hart. Unlike the RISC-V harts, DMA-capable devices connected to a supervisor domain remain continuously active. Such devices might initiate DMA even if the associated domain is not currently active on any RISC-V harts. As a result, the MPT of all supervisor domains must be constantly active for DMA protection. Furthermore, the IO subsystem must possess the capability to select the appropriate MPT for enforcement based on the identity of the device initiating the DMA.

Given this setup, the I/O subsystem is required to offer the following functions:

  • Supervisor Domain Classifier (SDCL): This classifier within the I/O subsystem interprets the attributes of a DMA request and determines the appropriate MPT for that request.

  • MPT Checker (MPTCHK): This function ensures that stipulated access controls by Smmpt are applied to the memory regions accessed by the DMA. It uses the MPT identified by the SDCL.

Collectively, these two functionalities form a logical block in the I/O subsystem, referred to as the I/O MPT checker.

Placement of I/O MPT checker

The IO Bridge serves as an intermediary, situated between the IO devices and the system interconnect, with the primary role of processing DMA transactions. These IO devices can initiate DMA transactions utilizing IO Virtual Addresses (IOVA). Notably, an IOVA could be in the form of a Virtual Address (VA), Guest Virtual Address (GVA), or Guest Physical Address (GPA). The configuration and interfacing of the I/O MPT Checker with respect to the IO Bridge is graphically represented in Figure 1.

IOMTTCHK
Figure 1: I/O MPT checker placement

The IO Bridge invokes the SDCL function using the SDID request interface (SDR) and provides it the identifiers associated with the incoming transaction. The SDCL classifies the request using the identifiers and provides the IOSDID and the IOMMU ID on the SDID completion interface (SDC).

The IO Bridge uses the IOMMU ID to determine the IOMMU governing this request. The IO Bridge uses the device translation request (DTR) interface to the selected IOMMU to request address translation. The selected IOMMU provides the response to the address translation request on the device translation completion (DTC) interface. As part of the address translation process, the IOMMU may need to access its in-memory data structures over its data structure interface (DSI).

The IO Bridge invokes the MPTCHK function over the MPT check request (MCR) interface and provides it the IOSDID and physical address of the access. The MPTCHK uses the IOSDID to determine the MPT associated with the supervisor domain and checks if the physical address may be accessed by the device or IOMMU associated with that supervisor domain. The result of the check is provided on the MPT check completion (MCC) interface to the IO Bridge. As part of the MPT check, the MPTCHK may need to perform implicit accesses to the MPT using the MPT walk interface (MWI). To perform the checks the MPTCHK uses the same MPT table format as used by the CPU’s MMU. Using the same MPT table formats as the CPU’s MMU allows the same table to be used simultaneously by the CPU MMU and MPTCHK. The MPT access permission lookup process used by MPTCHK is identical to that specified by Smmpt extension in section "MPT access permissions lookup process".

The RISC-V memory model requires memory access from a hart to be single-copy atomic. When RV32 is implemented the size of a single-copy atomic memory access is up to 32-bits. When RV64 is implemented the size of a single-copy atomic memory access is up to 64-bits. The size of a single-copy atomic memory access implemented by MPTCHK is UNSPECIFIED but is required to be at least 32-bits if all of the harts in the system implement RV32 and is required to be at least 64-bits if any of the harts in the system implement RV64. Software must follow the rules outlined below to update MPT entries.

  • It is generally unsafe to update fields of an MPT entry using stores of width less than the minimal single-copy atomic memory access supported by MPTCHK as it is legal for MPTCHK to read the entry at any time, including when only some of the partial stores have taken effect. For an update to be atomic, software must use a single store of width equal to the minimal single-copy atomic memory access supported by MPTCHK.

  • MPTCHK is not required to immediately observe the software updates to an MPT entry. Software must use the MPTINVAL operation outlined in Control register (control) to invalidate any previous copies of that entry that may be in the MPTCHK caches to synchronize the updates to the entry with the operation of MPTCHK.

Note

If an MPT entry is changed, MPTCHK may use the old value of the entry or the new value of the entry and the choice is unpredictable until software uses the MPTINVAL operation to synchronize updates to the entry with the operation of the MPTCHK. These are the only behaviors expected.

The I/O MPT checker provides a memory-mapped register programming interface.

If an MPT check disallows a transaction then the transaction is aborted.

If the aborted transaction is an IOMMU-initiated memory access then the IO bridge signals such access faults to the IOMMU itself. The details of such signaling is implementation defined.

If the aborted transaction is a write then the IO bridge may discard the write; the details of how the write is discarded are implementation defined. If the IO protocol requires a response for write transactions (e.g., AXI) then a response as defined by the IO protocol may be generated by the IO bridge (e.g., SLVERR on BRESP - Write Response channel). For PCIe, for example, write transactions are posted and no response is returned when a write transaction is discarded. If the faulting transaction is a read then the device expects a completion. The IO bridge may provide a completion to the device. The data, if returned, in such completion is implementation defined; usually it is a fixed value such as all 0 or all 1. A status code may be returned to the device in the completion to indicate this condition. For AXI, for example, the completion status is provided by SLVERR on RRESP (Read Data channel). For PCIe, for example, the completion status field may be set to "Unsupported Request" (UR) or "Completer Abort" (CA).

As part of its operations, MPTCHK may need to read data from the MPT. The provider (a memory controller or a cache) of the data may detect that the data requested has an uncorrectable error and signal that the data is corrupted and defer the error to MPTCHK. Such technique to defer the handling of the corrupted data to the consumer of the data is also commonly known as data poisoning. The effects of such errors may be contained to the transaction that caused the corrupted data to be accessed. In the cases where the error affects the transaction being processed but otherwise allows the MPTCHK to continue providing service, MPTCHK may request the IO bridge to abort the transaction. The MPTCHK may support the RISC-V RAS error record register interface (RERI) that specifies methods for enabling error detection, logging the detected errors, and configuring means to report the error to an error handler. When such a RAS architecture is supported, errors such as attempted consumption of poisoned data may be reported using the methods provided by the RAS architecture.

I/O MPT Checker Register Interface

Each I/O MPT checker (IOMPTCHK) register interface is memory-mapped starting at an 8-byte aligned physical address and includes the registers used to configure the SDCL and MPTCHK functions in the I/O MPT checker.

Note

Implementations may choose to implement a coarser alignment for the start address of the register interface. For example, some implementations may locate the register interface within a naturally aligned 4-KiB region (a page) of physical address space for each register interface. Coarser alignments may enable register decoding to be implemented without a hardware adder circuit.

The behavior for register accesses where the address is not aligned to the size of the access, or if the access spans multiple registers, or if the size of the access is not 4 bytes or 8 bytes, is UNSPECIFIED. An aligned 4 byte access to a IOMPTCHK register must be single-copy atomic. Whether an 8 byte access to an IOMPTCHK register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMPTCHK implementation, as if two separate 4 byte accesses were performed.

Note

The IOMPTCHK registers are defined in such a way that software can perform two individual 4 byte accesses, or hardware can perform two independent 4 byte transactions resulting from an 8 byte access, to the high and low halves of the register as long as the register semantics, with regards to side-effects, are respected between the two software accesses, or two hardware transactions, respectively.

The IOMPTCHK registers have little-endian byte order (even for systems where all harts are big-endian-only).

Note

Big-endian-configured harts that make use of I/O MPT may implement the REV8 byte-reversal instruction defined by the Zbb extension. If REV8 is not implemented, then endianness conversion may be implemented using a sequence of instructions.

Table 1. I/O MPT Checker register layout
Offset Name Size Description Optional?

0

capabilities

8

Capabilities

No

8

control

8

Control

No

16

operand-0

8

Operand 0

No

24

operand-1

8

Operand 1

No

The reset value is 0 for the following registers fields.

  • control - BUSY and STATUS fields

The reset value is UNSPECIFIED for all other registers and/or fields.

Capabilities (capabilities)

The capabilities register is a read-only register that holds the I/O MPT checker capabilities.

Capabilities register fields
{reg: [
  {bits:  8, name: 'VER'},
  {bits:  1, name: 'MPTM'},
  {bits: 39, name: 'WPRI'},
  {bits: 16, name: 'custom'}
], config:{lanes: 4, hspace:1024}}

The VER field holds the version of the specification implemented by the I/O MPT checker. The low nibble is used to hold the minor version of the specification and the upper nibble is used to hold the major version of the specification. For example, an implementation that supports version 1.0 of the specification reports 0x10.

The MPTM field indicates the supported MPT address protection schemes. If 1, then the MPT modes for RV64 are supported else the MPT modes for RV32 are supported.

Control register (control)

The control register is used to control classification of DMA requests using the identifiers associated with the DMA requests to determine the associated IO supervisor domain ID (IOSDID) and the Machine-level Memory Protection Tables (MMPT).

Control register (control)
{reg: [
  {bits:  8, name: 'OP (WARL)'},
  {bits: 16, name: 'RULEID (WARL)'},
  {bits:  8, name: 'WPRI'},
  {bits:  7, name: 'STATUS (RO)'},
  {bits:  1, name: 'BUSY (RO)'},
  {bits: 24, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The OP field is used to instruct IOMPTCHK to perform an operation listed in I/O MPT checker operations (OP). The RULEID is identifier of a rule in the SDCL function to operate on. The RULEID value of 0 indicates that the operation applies to all rules and is supported only if explicitly specified by an operation.

Table 2. I/O MPT checker operations (OP)
Operation Encoding Description

 — 

0

Reserved for future standard use.

SET_ENTRY

1

Configure the rule identified by RULEID with the operands specified in operand-0 and operand-1 registers.

GET_ENTRY

2

Read the configurations of a rule identified by RULEID. On successful completion of the operation, the operand-0 and operand-1 registers hold the current configurations of the rule. If the operation is not successful then the contents of operand-0 and operand-1 are UNSPECIFIED.

MPTINVAL

3

This operation ensures that stores to an MPT are observed by MPTCHK before subsequent implicit reads by MPTCHK to the corresponding MPT.

IOFENCE

4

This command can be used to request that IOMPTCHK ensure that all previous read and write requests from devices that have already been processed by IOMPTCHK be committed to a global ordering point such that they can be observed by all RISC-V harts, IOMMUs and devices in the system.

 — 

5-127

Reserved for future standard use.

 — 

128-255

Designated for custom use.

When the control is written, IOMPTCHK may need to perform several actions that may not complete synchronously with the write. A write to the control sets the BUSY bit to 1 indicating that IOMPTCHK is performing the requested operation. The behavior of writing the control register when the BUSY bit is 1 is UNSPECIFIED. Some implementations may ignore the second write and others may perform the operation determined by the second write. Software must verify that BUSY is 0 before writing control.

Note

An implementation that can always perform the requested operation synchronously with the write to control register may hardwire the BUSY field to 0.

When the BUSY bit reads 0 the operation is complete and the STATUS field provides a status value (control.STATUS field encodings) of the requested operation.

Table 3. control.STATUS field encodings
STATUS Description

0

Reserved

1

Operation was successfully completed.

2

Invalid operation (OP) requested.

3

Operation requested for invalid RULEID.

4

Illegal/invalid operand encodings used.

5-127

Reserved for future standard use.

128-255

Designated for custom use.

Before requesting the SET_ENTRY operation using the control register, software should program the fields of the operand-0 and operand-1 registers. The SET_ENTRY operation utilizes the following fields from the operand-0 and operand-1 registers: SRC_IDT, SRC_IDM, TEE_FLT, SRC_ID, IOMMU_ID, IOSDID, MPT_MODE, SRL, SML, SQRID and PPN.

If multiple rules are programmed to match a transaction, the implementation may act based on any one of those matching rules. However, if a transaction does not match any of the rules, the IO Bridge is notified of this condition. The subsequent behavior of the IO Bridge for unmatched transactions remains UNSPECIFIED.

An implementation that performs the requested operation synchronously may hardwire the BUSY bit to 0.

The GET_ENTRY operation ignores the contents of both the operand-0 and operand-1 registers. If the GET_ENTRY operation is unsuccessful, the contents of these registers remain UNSPECIFIED. However, upon a successful GET_ENTRY operation, the configurations of the rule identified by control.RULEID are provided in the following fields: SRC_IDT, SRC_IDM, TEE_FLT, SRC_ID, IOMMU_ID, IOSDID, MPT_MODE, SRL, SML, SQRID, and PPN. The state of all other fields in the operand-0 and operand-1 registers is UNSPECIFIED.

The contents of RULEID, operand-0 and operand-1 are disregarded by the IOFENCE operation.

The MPTINVAL operation utilizes the IOSDID field of operand-0 register and utilizes the following fields from the operand-1 register: PPNV, PPN, IOSDIDV, and S. The contents of RULEID and all other fields of operand-0 and operand-1 register are disregarded by the MPTINVAL operation.

Note

If an identical IOSDID is configured in two rules but the MPT referenced by the rules is not identical then it is unpredictable whether the MPT referenced by the first rule or the second rule will be used. These are the only expected behaviors.

Operand 0 register (operand-0)

The operand-0 register holds the input operands or the output results of operations requested through control.OP.

Operand-0 register (operand-0)
{reg: [
  {bits:  4, name: 'SRC_IDT (WARL)'},
  {bits:  2, name: 'SRC_IDM (WARL)'},
  {bits:  2, name: 'TEE_FLT (WARL)'},
  {bits: 24, name: 'SRC_ID'},
  {bits:  8, name: 'IOMMU_ID (WARL)'},
  {bits:  8, name: 'IOSDID (WARL)'},
  {bits:  4, name: 'SRL'},
  {bits:  4, name: 'SML'},
  {bits:  4, name: 'SQRID'},
  {bits:  4, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The SRC_IDT field identifies the type of identifier from the DMA transaction used by this classification rule. The SRC_IDT encodings are listed in operand-0.SRC_IDT field encodings.

Table 4. operand-0.SRC_IDT field encodings
SRC_IDT Description

0

None. This rule does not match any incoming transaction. All other fields of the operand-0 and operand-1 register are ignored if the control.OP is SET_ENTRY. All other fields of operand-0 and operand-1 register are UNSPECIFIED if the control.OP is GET_ENTRY.

1

Filter by device ID. The device ID is specified in SRC_ID field and may be up to 24-bit wide.

2

Filter by PCIe IDE stream ID and PCIe segment ID. The IDE stream ID is specified in the bits 7:0 of the SRC_ID field and the segment ID in bits 15:8 of the SRC_ID. The bits 23:16 of the SRC_ID field are ignored.

3 - 7

Reserved for future standard use.

8 - 15

Designated for custom use.

Note

In PCIe systems, an originating device can be pinpointed using a unique 16-bit identifier. This identifier is a composite of the PCI bus number (8 bits), device number (5 bits), and function number (3 bits), collectively referred to as the routing identifier or RID. In scenarios where an IOMMU manages multiple hierarchies, there’s also an optional segment number, which can be up to 8 bits. Each hierarchy in this context represents a distinct PCI Express I/O interconnect topology. Here, the Configuration Space addresses, which are delineated by the Bus, Device, and Function number tuple, remain distinct. Sometimes, the term Hierarchy is synonymous with Segment. Especially when in Flit Mode, the Segment number can be part of a Function’s ID.

The SRC_IDM field can configure SRC_ID matching mode for transactions. The SRC_IDM encodings are listed in operand-0.SRC_IDM field encodings.

Table 5. operand-0.SRC_IDM field encodings
SRC_IDM Description

0

Reserved for future standard use.

1

Unary. If Unary is selected, then this rule matches if all the bits of the source ID of the transaction match the value configured in the SRC_ID field.

2

NAPOT. If NAPOT is selected, then the rule matches a naturally aligned power-of-two range of source IDs. In this mode, the lower bits of the SRC_ID, up to and including the first low-order zero bit, are masked; the unmasked bits are compared with the corresponding bits in the source ID of the transaction to match.

3

TOR. If TOR (Top-Of-Range) is selected, the SRC_ID field forms the top of a range of source IDs. If rule r's SRC_IDM is set to TOR, the rule matches any source ID s if: s is greater than or equal to SRC_ID of rule r-1 and is less than the SRC_ID of rule r. If r is 0, then zero is used as the lower bound. If SRC_ID of rule r-1 is greater than or equal to that of rule r and TOR is selected for rule r, then rule r does not match any address.

Note

The following example illustrates the use of SRC_IDM=NAPOT when SRC_IDT is by DEVID and a 24-bit PCIe device_id comprised of the segment, bus, device, and function number is used. In the table below, y acts as a placeholder representing any 1-bit value.

Table 6. SRC_IDM with SRC_IDT set to Filter by device ID
SRC_IDM SRC_ID Comment

1

yyyyyyyy yyyyyyyy yyyyyyyy

One specific seg:bus:dev:func

2

yyyyyyyy yyyyyyyy yyyyy011

seg:bus:dev - any func

2

yyyyyyyy yyyyyyyy 01111111

seg:bus - any dev:func

2

yyyyyyyy 01111111 11111111

seg - any bus:dev:func

The TEE_FLT field may be used to filter transactions associated with a Trusted Execution Environment (TEE). The encodings for the TEE_FLT field can be found in operand-0.TEE_FLT field encodings.

Table 7. operand-0.TEE_FLT field encodings
TEE_FLT Description

0

Reserved for future standard use.

1

Rule matches TEE-associated transactions.

2

Rule matches transactions that are not TEE associated.

3

Rule matches both TEE-associated and non-TEE associated transactions.

Note

PCIe IDE provides security for transactions from one Port to another. These transactions might be initiated by contexts within the device, such as an SR-IOV virtual function, which are associated with a Trusted Execution Environment (TEE). Within the IDE TLP header, there’s a "T" bit that helps differentiate transactions related to a TEE. The TEE_FLT filter can be employed to associate these TEE-related transactions with a different supervisor domain than the transactions not related to TEE. This distinction is made even if both types of transactions are received on the same PCIe IDE stream.

Fields such as TEE_FLT and IOMMU_ID are WARL and may be hardwired to 0 if the implementation does not support PCIe IDE and/or an IOMMU.

The IOMMU_ID field identifies the instance of the IOMMU that should be used to provide address translation and protection for the transactions matching this rule.

The IOSDID field identifies the supervisor domain whose memory is accessed by this transaction. When operand-1.MPT_MODE is Bare, the SET_ENTRY operations requires the IOSDID field to be zero.

The SRL and SML fields along with operand-1.SSM field are used to determine the effective RCID and MCID provided by the IOMMU for device originated requests. The determination of the effective RCID and MCID is as specified by [SMQOSID]. The SQRID identifies the QRI for requests originating from the devices and the IOMMU associated with the SD and accompanies the RCID and MCID in the requests made by the device to the QRI.

Operand 1 register (operand-1)

The operand-1 register holds the input operands or the output results of operations requested through control.OP.

Operand-1 register (operand-1)
{reg: [
  {bits:  4, name: 'MPT_MODE (WARL)'},
  {bits:  1, name: 'PPNV (WARL)'},
  {bits:  1, name: 'S (WARL)'},
  {bits:  1, name: 'IOSDIDV'},
  {bits:  1, name: 'SSM'},
  {bits:  2, name: 'WPRI'},
  {bits: 44, name: 'PPN'},
  {bits: 10, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The MPT_MODE field identifies the mode of the MPT. It’s interpreted as outlined in [mpt-32] when capabilities.MPTM is 1, and as detailed in [mpt-64] otherwise. The MPT_MODE field is programmed into the rule identified by RULEID via the SET_ENTRY operation and can be retrieved by the GET_ENTRY operation. Both the IOFENCE and MPTINVAL operations disregard the MPT_MODE field.

The PPN field programs the PPN of the root page of the MPT during the SET_ENTRY operation and is retrieved by the GET_ENTRY operation. When MPT_MODE is Bare, the SET_ENTRY operations requires the PPN field to be zero. The IOFENCE operation disregards this field.

For the MPTINVAL operation, the PPNV field indicates if the PPN field is valid and the IOSDIDV field indicates if the IOSDID field is valid for the operation. When a field is not valid for an operation, it is ignored by the operation. When the PPNV field is 1, the S field sets the address range size for the MPTINVAL operation. With an S field value of 0, the range size is 4 KiB. But, when the S field has a value of 1, the MPTINVAL operation focuses on a NAPOT range. This range is decided by the low-order bits of the PPN field, going up to the first low-order 0 bit (inclusive of this position). If the initial low-order 0 bit position is denoted as x, the size of the range is computed as (1 << (12 + x + 1)). When PPNV is set to 1, if the address range specifed by PPN and S is invalid, the operation may or may not be performed. Operations besides MPTINVAL disregard the PPNV field.

The MPTINVAL operation ensures that stores to the MPT are observed by MPTCHK before subsequent implicit reads by MPTCHK to the corresponding MPT.

  1. MPTINVAL operands and operations

PPNV

IOSDIDV

Operation

0

0

Invalidates information cached from any MPT for all supervisor domain address spaces.

0

1

Invalidates information cached from the MPT for the adddress space of the supervisor domain identified by the IOSDID operand.

1

0

Invalidates information cached from the MPT for the address range in the PPN operand for all supervisor domain address spaces.

1

1

Invalidates information cached from the MPT for the address range in the PPN operand for the supervisor domain address space identified by the IOSDID operand.

Note

The following example illustrates the use of S field to specify an address range for the MPTINVAL operation. The example shows encoding ranges of up to 8 GiB. Larger ranges may be encoded using the upper address bits (bits 43:22) of the PPN field.

  1. Examples of specifying address range sizes using S field

PPN[21:0]

S

Address Range Size

yyyyy yyyyyyyy yyyyyyyy

0

4 KiB

yyyyy yyyyyyyy yyyyyyy0

1

8 KiB

yyyyy yyyyyyy0 11111111

1

2 MiB

yyy01 11111111 11111111

1

1 GiB

01111 11111111 11111111

1

8 GiB

Note

Simpler implementations may ignore the operands of MPTINVAL operation and perform a global invalidation of all information cached from any MPT.

A consequence of this specification is that an implementation may use any information for an address that was valid in the MPT at any time since the most recent MPTINVAL that subsumes that address.

Another consequence of this specification is that it is generally unsafe to update the MPT using a set of stores of a width less than the width of the MPT entry, as it is legal for the implementation to read the MPT entries at any time, including when only some of the partial stores have taken effect.

The IOMMU itself is a DMA capable device. The DMA performed by the IOMMU is performed using the device ID of the IOMMU. A rule must be defined to associate the IOMMU device ID itself with an IOSDID and MPT unless the IOMMU device ID is encompassed by another rule that associates device IDs with an SD.