Supervisor domains may be granted control over DMA-capable devices. When such direct device association is supported, the system might also incorporate multiple instances of IOMMU. Each IOMMU instance can be tied directly to a supervisor domain, allowing that domain to manage address translation and protection for DMA that originates from devices under its control.
To uphold isolation properties, the DMA from the devices and the IOMMU linked with a supervisor domain must adhere strictly to the access protections encoded in the MPT of the respective supervisor domain. Additionally, using the MPT, the RDSM enforces that the IOMMU memory-mapped programming regions are access-restricted to the supervisor domain the IOMMU is assigned to.
At any given time, a solitary supervisor domain is scheduled for execution on a RISC-V hart by the root domain security manager (RDSM). As part of this scheduling, the RDSM programs a pointer to the MPT into a CSR within the hart. Unlike the RISC-V harts, DMA-capable devices connected to a supervisor domain remain continuously active. Such devices might initiate DMA even if the associated domain is not currently active on any RISC-V harts. As a result, the MPT of all supervisor domains must be constantly active for DMA protection. Furthermore, the IO subsystem must possess the capability to select the appropriate MPT for enforcement based on the identity of the device initiating the DMA.
Given this setup, the I/O subsystem is required to offer the following functions:
-
Supervisor Domain Classifier (SDCL): This classifier within the I/O subsystem interprets the attributes of a DMA request and determines the appropriate MPT for that request.
-
MPT Checker (MPTCHK): This function ensures that stipulated access controls by
Smmpt
are applied to the memory regions accessed by the DMA. It uses the MPT identified by the SDCL.
Collectively, these two functionalities form a logical block in the I/O subsystem, referred to as the I/O MPT checker.
The IO Bridge serves as an intermediary, situated between the IO devices and the system interconnect, with the primary role of processing DMA transactions. These IO devices can initiate DMA transactions utilizing IO Virtual Addresses (IOVA). Notably, an IOVA could be in the form of a Virtual Address (VA), Guest Virtual Address (GVA), or Guest Physical Address (GPA). The configuration and interfacing of the I/O MPT Checker with respect to the IO Bridge is graphically represented in Figure 1.
The IO Bridge invokes the SDCL function using the SDID request interface (SDR) and provides it the identifiers associated with the incoming transaction. The SDCL classifies the request using the identifiers and provides the IOSDID and the IOMMU ID on the SDID completion interface (SDC).
The IO Bridge uses the IOMMU ID to determine the IOMMU governing this request. The IO Bridge uses the device translation request (DTR) interface to the selected IOMMU to request address translation. The selected IOMMU provides the response to the address translation request on the device translation completion (DTC) interface. As part of the address translation process, the IOMMU may need to access its in-memory data structures over its data structure interface (DSI).
The IO Bridge invokes the MPTCHK function over the MPT check request (MCR)
interface and provides it the IOSDID and physical address of the access. The
MPTCHK uses the IOSDID to determine the MPT associated with the supervisor
domain and checks if the physical address may be accessed by the device or IOMMU
associated with that supervisor domain. The result of the check is provided on
the MPT check completion (MCC) interface to the IO Bridge. As part of the MPT
check, the MPTCHK may need to perform implicit accesses to the MPT using the MPT
walk interface (MWI). To perform the checks the MPTCHK uses the same MPT table
format as used by the CPU’s MMU. Using the same MPT table formats as the CPU’s
MMU allows the same table to be used simultaneously by the CPU MMU and MPTCHK.
The MPT access permission lookup process used by MPTCHK is identical to that
specified by Smmpt
extension in section "MPT access permissions lookup process".
The RISC-V memory model requires memory access from a hart to be single-copy atomic. When RV32 is implemented the size of a single-copy atomic memory access is up to 32-bits. When RV64 is implemented the size of a single-copy atomic memory access is up to 64-bits. The size of a single-copy atomic memory access implemented by MPTCHK is UNSPECIFIED but is required to be at least 32-bits if all of the harts in the system implement RV32 and is required to be at least 64-bits if any of the harts in the system implement RV64. Software must follow the rules outlined below to update MPT entries.
-
It is generally unsafe to update fields of an MPT entry using stores of width less than the minimal single-copy atomic memory access supported by MPTCHK as it is legal for MPTCHK to read the entry at any time, including when only some of the partial stores have taken effect. For an update to be atomic, software must use a single store of width equal to the minimal single-copy atomic memory access supported by MPTCHK.
-
MPTCHK is not required to immediately observe the software updates to an MPT entry. Software must use the
MPTINVAL
operation outlined in Control register (control
) to invalidate any previous copies of that entry that may be in the MPTCHK caches to synchronize the updates to the entry with the operation of MPTCHK.
Note
|
If an MPT entry is changed, MPTCHK may use the old value of the entry or the
new value of the entry and the choice is unpredictable until software uses the
|
The I/O MPT checker provides a memory-mapped register programming interface.
If an MPT check disallows a transaction then the transaction is aborted.
If the aborted transaction is an IOMMU-initiated memory access then the IO bridge signals such access faults to the IOMMU itself. The details of such signaling is implementation defined.
If the aborted transaction is a write then the IO bridge may discard the write; the details of how the write is discarded are implementation defined. If the IO protocol requires a response for write transactions (e.g., AXI) then a response as defined by the IO protocol may be generated by the IO bridge (e.g., SLVERR on BRESP - Write Response channel). For PCIe, for example, write transactions are posted and no response is returned when a write transaction is discarded. If the faulting transaction is a read then the device expects a completion. The IO bridge may provide a completion to the device. The data, if returned, in such completion is implementation defined; usually it is a fixed value such as all 0 or all 1. A status code may be returned to the device in the completion to indicate this condition. For AXI, for example, the completion status is provided by SLVERR on RRESP (Read Data channel). For PCIe, for example, the completion status field may be set to "Unsupported Request" (UR) or "Completer Abort" (CA).
As part of its operations, MPTCHK may need to read data from the MPT. The provider (a memory controller or a cache) of the data may detect that the data requested has an uncorrectable error and signal that the data is corrupted and defer the error to MPTCHK. Such technique to defer the handling of the corrupted data to the consumer of the data is also commonly known as data poisoning. The effects of such errors may be contained to the transaction that caused the corrupted data to be accessed. In the cases where the error affects the transaction being processed but otherwise allows the MPTCHK to continue providing service, MPTCHK may request the IO bridge to abort the transaction. The MPTCHK may support the RISC-V RAS error record register interface (RERI) that specifies methods for enabling error detection, logging the detected errors, and configuring means to report the error to an error handler. When such a RAS architecture is supported, errors such as attempted consumption of poisoned data may be reported using the methods provided by the RAS architecture.
Each I/O MPT checker (IOMPTCHK) register interface is memory-mapped starting at an 8-byte aligned physical address and includes the registers used to configure the SDCL and MPTCHK functions in the I/O MPT checker.
Note
|
Implementations may choose to implement a coarser alignment for the start address of the register interface. For example, some implementations may locate the register interface within a naturally aligned 4-KiB region (a page) of physical address space for each register interface. Coarser alignments may enable register decoding to be implemented without a hardware adder circuit. |
The behavior for register accesses where the address is not aligned to
the size of the access, or if the access spans multiple registers, or if the
size of the access is not 4 bytes or 8 bytes, is UNSPECIFIED
. An aligned 4
byte access to a IOMPTCHK register must be single-copy atomic. Whether an 8 byte
access to an IOMPTCHK register is single-copy atomic is UNSPECIFIED
, and such
an access may appear, internally to the IOMPTCHK implementation, as if two
separate 4 byte accesses were performed.
Note
|
The IOMPTCHK registers are defined in such a way that software can perform two individual 4 byte accesses, or hardware can perform two independent 4 byte transactions resulting from an 8 byte access, to the high and low halves of the register as long as the register semantics, with regards to side-effects, are respected between the two software accesses, or two hardware transactions, respectively. |
The IOMPTCHK registers have little-endian byte order (even for systems where all harts are big-endian-only).
Note
|
Big-endian-configured harts that make use of I/O MPT may implement the |
Offset | Name | Size | Description | Optional? |
---|---|---|---|---|
0 |
|
8 |
No |
|
8 |
|
8 |
No |
|
16 |
|
8 |
No |
|
24 |
|
8 |
No |
The reset value is 0 for the following registers fields.
-
control
-BUSY
andSTATUS
fields
The reset value is UNSPECIFIED
for all other registers and/or fields.
The capabilities
register is a read-only register that holds the I/O MPT
checker capabilities.
{reg: [ {bits: 8, name: 'VER'}, {bits: 1, name: 'MPTM'}, {bits: 39, name: 'WPRI'}, {bits: 16, name: 'custom'} ], config:{lanes: 4, hspace:1024}}
The VER
field holds the version of the specification implemented by the
I/O MPT checker. The low nibble is used to hold the minor version of the
specification and the upper nibble is used to hold the major version of the
specification. For example, an implementation that supports version 1.0 of the
specification reports 0x10.
The MPTM
field indicates the supported MPT address protection schemes. If 1,
then the MPT modes for RV64 are supported else the MPT modes for RV32 are
supported.
The control
register is used to control classification of DMA requests using
the identifiers associated with the DMA requests to determine the associated
IO supervisor domain ID (IOSDID
) and the Machine-level Memory Protection Tables (MMPT
).
control
){reg: [ {bits: 8, name: 'OP (WARL)'}, {bits: 16, name: 'RULEID (WARL)'}, {bits: 8, name: 'WPRI'}, {bits: 7, name: 'STATUS (RO)'}, {bits: 1, name: 'BUSY (RO)'}, {bits: 24, name: 'WPRI'} ], config:{lanes: 8, hspace:1024}}
The OP
field is used to instruct IOMPTCHK to perform an operation listed in
I/O MPT checker operations (OP
). The RULEID
is identifier of a rule in the SDCL function to
operate on. The RULEID
value of 0 indicates that the operation applies to all
rules and is supported only if explicitly specified by an operation.
OP
)
Operation | Encoding | Description |
---|---|---|
— |
0 |
Reserved for future standard use. |
|
1 |
Configure the rule identified by |
|
2 |
Read the configurations of a rule identified by
|
|
3 |
This operation ensures that stores to an MPT are observed by MPTCHK before subsequent implicit reads by MPTCHK to the corresponding MPT. |
|
4 |
This command can be used to request that IOMPTCHK ensure that all previous read and write requests from devices that have already been processed by IOMPTCHK be committed to a global ordering point such that they can be observed by all RISC-V harts, IOMMUs and devices in the system. |
— |
5-127 |
Reserved for future standard use. |
— |
128-255 |
Designated for custom use. |
When the control
is written, IOMPTCHK may need to perform several actions that
may not complete synchronously with the write. A write to the control
sets the
BUSY
bit to 1 indicating that IOMPTCHK is performing the requested operation.
The behavior of writing the control
register when the BUSY
bit is 1 is
UNSPECIFIED
. Some implementations may ignore the second write and others may
perform the operation determined by the second write. Software must verify that
BUSY
is 0 before writing control
.
Note
|
An implementation that can always perform the requested operation synchronously
with the write to |
When the BUSY
bit reads 0 the operation is complete and the STATUS
field
provides a status value (control.STATUS
field encodings) of the requested operation.
control.STATUS
field encodings
STATUS |
Description |
---|---|
0 |
Reserved |
1 |
Operation was successfully completed. |
2 |
Invalid operation ( |
3 |
Operation requested for invalid |
4 |
Illegal/invalid operand encodings used. |
5-127 |
Reserved for future standard use. |
128-255 |
Designated for custom use. |
Before requesting the SET_ENTRY
operation using the control
register,
software should program the fields of the operand-0
and operand-1
registers. The SET_ENTRY
operation utilizes the following fields from the
operand-0
and operand-1
registers: SRC_IDT
, SRC_IDM
, TEE_FLT
,
SRC_ID
, IOMMU_ID
, IOSDID
, MPT_MODE
, SRL
, SML
, SQRID
and PPN
.
If multiple rules are programmed to match a transaction, the implementation may
act based on any one of those matching rules. However, if a transaction does not
match any of the rules, the IO Bridge is notified of this condition. The
subsequent behavior of the IO Bridge for unmatched transactions remains
UNSPECIFIED
.
An implementation that performs the requested operation synchronously may
hardwire the BUSY
bit to 0.
The GET_ENTRY
operation ignores the contents of both the operand-0
and
operand-1
registers. If the GET_ENTRY
operation is unsuccessful, the
contents of these registers remain UNSPECIFIED
. However, upon a successful
GET_ENTRY
operation, the configurations of the rule identified by
control.RULEID
are provided in the following fields: SRC_IDT
, SRC_IDM
,
TEE_FLT
, SRC_ID
, IOMMU_ID
, IOSDID
, MPT_MODE
, SRL
, SML
, SQRID
,
and PPN
. The state of all other fields in the operand-0
and operand-1
registers is UNSPECIFIED
.
The contents of RULEID
, operand-0
and operand-1
are disregarded by the
IOFENCE
operation.
The MPTINVAL
operation utilizes the IOSDID
field of operand-0
register and
utilizes the following fields from the operand-1
register: PPNV
, PPN
,
IOSDIDV
, and S
. The contents of RULEID
and all other fields of operand-0
and operand-1
register are disregarded by the MPTINVAL
operation.
Note
|
If an identical |
The operand-0
register holds the input operands or the output results of
operations requested through control.OP
.
operand-0
){reg: [ {bits: 4, name: 'SRC_IDT (WARL)'}, {bits: 2, name: 'SRC_IDM (WARL)'}, {bits: 2, name: 'TEE_FLT (WARL)'}, {bits: 24, name: 'SRC_ID'}, {bits: 8, name: 'IOMMU_ID (WARL)'}, {bits: 8, name: 'IOSDID (WARL)'}, {bits: 4, name: 'SRL'}, {bits: 4, name: 'SML'}, {bits: 4, name: 'SQRID'}, {bits: 4, name: 'WPRI'} ], config:{lanes: 8, hspace:1024}}
The SRC_IDT
field identifies the type of identifier from the DMA transaction
used by this classification rule. The SRC_IDT
encodings are listed in
operand-0.SRC_IDT
field encodings.
operand-0.SRC_IDT
field encodings
SRC_IDT |
Description |
---|---|
0 |
None. This rule does not match any incoming transaction. All other
fields of the |
1 |
Filter by device ID. The device ID is specified in |
2 |
Filter by PCIe IDE stream ID and PCIe segment ID. The IDE stream ID
is specified in the bits 7:0 of the |
3 - 7 |
Reserved for future standard use. |
8 - 15 |
Designated for custom use. |
Note
|
In PCIe systems, an originating device can be pinpointed using a unique 16-bit identifier. This identifier is a composite of the PCI bus number (8 bits), device number (5 bits), and function number (3 bits), collectively referred to as the routing identifier or RID. In scenarios where an IOMMU manages multiple hierarchies, there’s also an optional segment number, which can be up to 8 bits. Each hierarchy in this context represents a distinct PCI Express I/O interconnect topology. Here, the Configuration Space addresses, which are delineated by the Bus, Device, and Function number tuple, remain distinct. Sometimes, the term Hierarchy is synonymous with Segment. Especially when in Flit Mode, the Segment number can be part of a Function’s ID. |
The SRC_IDM
field can configure SRC_ID
matching mode for
transactions. The SRC_IDM
encodings are listed in operand-0.SRC_IDM field encodings.
SRC_IDM |
Description |
---|---|
0 |
Reserved for future standard use. |
1 |
Unary. If Unary is selected, then this rule matches if all the bits
of the source ID of the transaction match the value configured in
the |
2 |
NAPOT. If NAPOT is selected, then the rule matches a naturally
aligned power-of-two range of source IDs. In this mode, the lower
bits of the |
3 |
TOR. If TOR (Top-Of-Range) is selected, the |
Note
|
The following example illustrates the use of SRC_IDM with SRC_IDT set to Filter by device ID
|
The TEE_FLT
field may be used to filter transactions associated with a Trusted
Execution Environment (TEE). The encodings for the TEE_FLT
field can be found
in operand-0.TEE_FLT
field encodings.
operand-0.TEE_FLT
field encodings
TEE_FLT |
Description |
---|---|
0 |
Reserved for future standard use. |
1 |
Rule matches TEE-associated transactions. |
2 |
Rule matches transactions that are not TEE associated. |
3 |
Rule matches both TEE-associated and non-TEE associated transactions. |
Note
|
PCIe IDE provides security for transactions from one Port to another. These
transactions might be initiated by contexts within the device, such as an SR-IOV
virtual function, which are associated with a Trusted Execution Environment
(TEE). Within the IDE TLP header, there’s a "T" bit that helps differentiate
transactions related to a TEE. The Fields such as |
The IOMMU_ID
field identifies the instance of the IOMMU that should be used to
provide address translation and protection for the transactions matching this
rule.
The IOSDID
field identifies the supervisor domain whose memory is accessed by
this transaction. When operand-1.MPT_MODE
is Bare
, the SET_ENTRY
operations requires the IOSDID
field to be zero.
The SRL
and SML
fields along with operand-1.SSM
field are used to determine
the effective RCID
and MCID
provided by the IOMMU for device originated
requests. The determination of the effective RCID
and MCID
is as specified
by [SMQOSID]. The SQRID
identifies the QRI for requests originating from the
devices and the IOMMU associated with the SD and accompanies the RCID
and
MCID
in the requests made by the device to the QRI.
The operand-1
register holds the input operands or the output results of
operations requested through control.OP
.
operand-1
){reg: [ {bits: 4, name: 'MPT_MODE (WARL)'}, {bits: 1, name: 'PPNV (WARL)'}, {bits: 1, name: 'S (WARL)'}, {bits: 1, name: 'IOSDIDV'}, {bits: 1, name: 'SSM'}, {bits: 2, name: 'WPRI'}, {bits: 44, name: 'PPN'}, {bits: 10, name: 'WPRI'} ], config:{lanes: 8, hspace:1024}}
The MPT_MODE
field identifies the mode of the MPT. It’s interpreted as
outlined in [mpt-32] when capabilities.MPTM
is 1, and as detailed in
[mpt-64] otherwise. The MPT_MODE
field is programmed into the rule
identified by RULEID
via the SET_ENTRY
operation and can be retrieved by
the GET_ENTRY
operation. Both the IOFENCE
and MPTINVAL
operations
disregard the MPT_MODE
field.
The PPN
field programs the PPN of the root page of the MPT during the
SET_ENTRY
operation and is retrieved by the GET_ENTRY
operation. When
MPT_MODE
is Bare
, the SET_ENTRY
operations requires the PPN
field to be
zero. The IOFENCE
operation disregards this field.
For the MPTINVAL
operation, the PPNV
field indicates if the PPN
field is
valid and the IOSDIDV
field indicates if the IOSDID
field is valid for the
operation. When a field is not valid for an operation, it is ignored by the
operation. When the PPNV
field is 1, the S
field sets the address range size
for the MPTINVAL
operation. With an S
field value of 0, the range size is
4 KiB. But, when the S
field has a value of 1, the MPTINVAL
operation
focuses on a NAPOT range. This range is decided by the low-order bits of the
PPN
field, going up to the first low-order 0 bit (inclusive of this position).
If the initial low-order 0 bit position is denoted as x
, the size of the range
is computed as (1 << (12 + x + 1))
. When PPNV
is set to 1, if the address
range specifed by PPN
and S
is invalid, the operation may or may not be
performed. Operations besides MPTINVAL
disregard the PPNV
field.
The MPTINVAL
operation ensures that stores to the MPT are observed by MPTCHK
before subsequent implicit reads by MPTCHK to the corresponding MPT.
-
MPTINVAL
operands and operations
|
|
Operation |
0 |
0 |
Invalidates information cached from any MPT for all supervisor domain address spaces. |
0 |
1 |
Invalidates information cached from the MPT for the
adddress space of the supervisor domain identified by
the |
1 |
0 |
Invalidates information cached from the MPT for the
address range in the |
1 |
1 |
Invalidates information cached from the MPT for the
address range in the |
Note
|
The following example illustrates the use of
|
Note
|
Simpler implementations may ignore the operands of A consequence of this specification is that an implementation may use any
information for an address that was valid in the MPT at any time since the most
recent Another consequence of this specification is that it is generally unsafe to update the MPT using a set of stores of a width less than the width of the MPT entry, as it is legal for the implementation to read the MPT entries at any time, including when only some of the partial stores have taken effect. The IOMMU itself is a DMA capable device. The DMA performed by the IOMMU is
performed using the device ID of the IOMMU. A rule must be defined to associate
the IOMMU device ID itself with an |