diff --git a/chapter2.adoc b/chapter2.adoc index 208b3d2..c8d671f 100644 --- a/chapter2.adoc +++ b/chapter2.adoc @@ -1,51 +1,40 @@ [[chapter2]] == Summary of extensions for Supervisor Domain Access Protection -=== Architecture Extensions +The following normative architecture extensions are defined. -The following normative architecture extensions are defined. The following -sub-section describes the (informative) theory of operation. - -* `Smsdid` (<>) - An interface to signal the active supervisor domain +* `Smsdid` (<>) - An interface to program the active supervisor domain under which a hart is operating. This is a dynamic control state on the hart that can be held in an M-mode CSR and modifiable by the RDSM via CSR r/w instructions - herewith called the `supervisor domain identifier` assigned to -the hart. Supervisor domains are orthogonal to hart privilege levels and since -Smmtt enables physical memory isolation, there is one CSR (per hart) managed by -M-mode. `Smsdid` is expected to be used in conjunction with `Smmtt` for physical -memory isolation along with mechanisms such as `PMP` and `Smepmp`. Device side -accesses are addressed in the `IO-MTT` extension. Isolation of data within a -device is out of scope of this specification. - -* `Smmtt` (<>) - An interface to set the access permissions for a memory -region or page associated with a supervisor domain. This interface allows -dynamic changes of association (which may require appropriate flushing of any -state cached in harts). The association mapping is programmed via an Memory -Tracking Table (MTT) structure, accessed via per-hart M-mode CSRs and which may -be backed by additional in-memory structures. The M-mode CSR interface is -expected to program the root physical page (MTTPPN) - for when the MTT is a -memory-based structure, the MTTPPN would hold the physical address of the root -page of the MTT structure in memory - the MTT is expected to be memory resident -at time of access. Write access to MTT structures must be restricted by and to -the RDSM (except for when explicitly allowed by the RDSM). Privilege levels may -affect changes in the MTT under purview of the Supervisor Domain Security -Manager (SDSM) either through an SBI interface into M-mode (or may have the -ability to edit MTT structures by virtue of how the MTT structure in memory is -accessible to lower privilege levels). MTT and e(PMP) are always active. -MTT may be configured to be `Bare` if granular memory access control -is not required. The SDID -CSR defined by `Smsdid` is used as defined. - -* `IO-MTT` (<>) - This non-ISA interface enables programming of an IO -interconnect to associate SDID to IOMMU ID (called the SD Classifier). The +the hart. The SDID is a local identifier for the hart and may be used to tag +hart-local resources to access-control data associated with the supervisor +domain. The supervisor domain identifier is independent from the hart privilege +levels and is held in a M-mode CSR. This extension may be +used independently or may be combined with other extensions in this +specification. + +* `Smmtt` (<>) - An extension to set the access permissions for a memory +region or page associated with a supervisor domain. This extension allows for +dynamic changes of access permission. Such dynamic changes may require flushing of +appropriate state cached in harts. The access properties are programmed via an Memory +Tracking Table (MTT) structure. The physical page number (PPN) of the root table of +the MTT is programmed into a M-mode CSR. When `Smmtt` is implemented, MTT +and e(PMP) are always active. Although there is no option to disable MTT, it can be +effectively disabled if granular memory access control is not required by configuring +MTT mode to be `Bare`. + +* `IO-MTT` (<>) - A non-ISA extension that enables programming of an IO +interconnect to associate an IOMMU and devices in scope of that IOMMU with an SD. The assignment of IOMMUs to supervisor domains is also expected to be under the -purview of the RDSM. IO-MTT interface specifies the memory access interface for -physical-addresses encountered during IOMMU address translation as well for the -final physical address of access. - -* `Smsdia` (<>) - This M-mode CSR interface enables assignment of IMSIC -S-interrupt file or an APLIC domain to a Supervisor Domain. The interface also -describes CSRs to allow M-mode software to retain control on notification of +purview of the RDSM. IO-MTT extension specifies the memory access control mechanisms for +memory accesses performed by the IOMMU as well as by the devices associated with that SD. +Note that isolation of data within a device is +out of scope of this specification. + +* `Smsdia` (<>) - This extension enables assignment of IMSIC +interrupt file(s) or an APLIC domain to a supervisor domain. The extension also +provides CSRs to allow M-mode software to retain control on notification of interrupts when Supervisor domains are enabled. * `Smsdedbg` (<>) - This extension provides the controls to indicate @@ -53,138 +42,15 @@ if external debug is allowed for a supervisor domain. Whether external debug is authorized or not is expected to be done via a root of trust (RoT) and is outside the scope of this specification. -=== Theory of operation (informative) -Supervisor Domain Access Protection extensions are used by M-mode software to -program if physically-addressed memory (or device-mapped region) is -accessible (read/write) by a hart/device operating under the control of S-mode -software within a domain. Associating a hart/device with a supervisor domain -implies that any physical-addressable region access occurring in the context -of the supervisor domain is subject to access-checks for that domain. -Hence, software or hardware accesses that originate from other supervisor -domains other than the owner supervisor domain can be explicitly -prevented/allowed by using the Smmtt extension. The RDSM has access to physical -memory for all supervisor domains. - -Memory regions may be accessed by harts or by other devices on the platform. -When harts and devices are assigned to a supervisor domain, the hart/device is -said to perform memory accesses in the context of that supervisor domain. For -all accesses using a physical address, the SDID is the supervisor domain -identifier programmed into a CSR. This CSR is programmed on the hart by the -Root Domain Security Manager (RDSM). The assignment of the hart/device to a -supervisor domain may be static (e.g. device assignment to a VM) or dynamic -(e.g. scheduling a VM virtual cpu within a domain). The MTT for the supervisor -domain active on the hart is programmed on the hart along with the supervisor -domain identifier. The MTT does not perform any address translation; it simply -provides access permissions for the physically addressed region/page (post any -S-mode and/or G-stage address translation) to enforce the isolation properties -per the use case requirements (see <>). - -[caption="Figure {counter:image}: ", reftext="Figure {image}"] -[title= "MTT lookup for Supervisor Domain Access", id=mtt-lookup] -image::images/fig2.png[] - -The MTT checker is a functional block that looks up the MTT using the physical -address of the access as an index to retrieve the access permissions for the -supervisor domain. This checker thus enforces that for a load initiated by the -hart, the physical address is readable, and for a store initiated by the hart, -the physical address is also writable, else reports a fault. An access -violation is reported as a trap to the supervisor domain and may be handled by -the M-mode Root domain security manager. Such disallowed accesses are ideally -handled with no data divulged. This MTT checker may be implemented -as an MMU extension in the hart, and/or in the IO interconnect to check device -accesses. The MTT checker is designed to work together with the page-based -virtual memory (MMU, IOMMU) systems and Physical Memory Protection -(PMP, IOPMP) mechanisms. Read and Write permissions for memory are derived from -the page table, the PMP and the MTT - an access is allowed only when all -protection mechanisms allow the access. When paging is enabled, instructions -that access virtual memory may result in multiple physical-memory accesses, -including (implicit S-mode) accesses to the page tables. MTT checks also apply -to these implicit accesses - those accesses will be treated as reads for -translation and as writes when A/D bits are updated in page table entries when -`Svadu` is implemented. - -MTTs are checked by the MTT checker for all accesses to eligible -physical memory, including accesses that have undergone virtual to -physical memory translation, but excluding MTT structure accesses. The -MTT checker indexes the MTT using the physical address of the access to -retrieve the access permissions, and checks that the hart or device is allowed -to access the physical memory accessed. A mismatch of the access type and -the access permissions specified in the MTT entry that applies to the -accessed region is reported as a trap to the supervisor domain software or -to the RDSM and the access is -disallowed with no data divulged. As described above, to support -architectural virtual address page sizes, the MTT allows configuration -at those supported architectural page sizes. MTT violations manifest as -instruction, load, or store access-fault exceptions. The exception -conditions for MTT are checked when the access to memory is performed. - -MTT may be used to provide permissions for physical memory addresses -that hold regular main memory or IO memory. Memory may be assigned to -the RDSM to bootstrap the subsequent run-time lookup structures for MTT. -All memory should be covered by the MTT, though some memory may not be -eligible to be qualified for assignment to a specific supervisor domain. -This limitation may arise due to platform configuration and security -policies - for example, if the platform security policy requires memory -for a domain to be encrypted and some memory access paths are not -enforced via an inline memory encryption engine. It is expected that the -RDSM can use trusted platform-specific methods to enumerate which -regions can be designated as access-controlled via the MTT. - -MTT must support both static and run-time configurability. A memory -region (consisting of one or more pages) may be (re)assigned from one -domain to another at run-time e.g. this is done by revoking the -permission for one domain and assigning permissions to another domain. -Run-time configuration may be performed via M-mode CSRs and/or in-memory -structures. The in-memory structures used for MTT must themselves be -access-limited to the RDSM by use of the MTT structures to disallow any -supervisor domain from accessing the structures unless explicitly -delegated by the Root Domain Security Manager (RDSM) to a particular -domain (per use case policies). To support MTT dynamic reconfiguration, -an interface is expected to be provided to set the attributes by passing -requests to a trusted driver (in the RDSM) that can reconfigure the -memory region assignment. Converting memory regions assignment from one -domain to another might involve platform-specific operations based on -the enforcement mechanism, such as TLB/cache flushes, that must be -enforced by the RDSM and hardware. The RDSM is expected to change the -settings and flush caches if necessary, so the system is only incoherent -during the transition between domain assignment settings. This -transitory state should not be visible to lower privilege levels (i.e. -supervisor domains). There are also security aspects to be considered during -(re)configuration, e.g., clearing memory used by the current SD before -assigning it to another SD. Refer to the RISC-V CoVE cite:[CoVE] ABI and threat -model as a reference. - -A hart/device may perform accesses to memory exclusively accessible to it's -supervisor domain, or to memory shared globally with one or more supervisor -domains. Memory sharing between supervisor domains is achieved by simply making -the physical memory region accessible to the supervisor domains via the MTT -structure associated with the hart or device. Access to physical addresses -initiated from a hart or a device assigned a supervisor domain identifier may be -denied by virtue of the permissions in the MTT lookup - such disallowed accesses -cause a trap which may be reported to the supervisor domain software or to the -RDSM to report a fault. +* `Smsdetrc` (<>) - This extension provides the controls to indicate +if external trace is allowed for a supervisor domain. Whether external trace is +authorized or not is expected to be done via a root of trust (RoT) and is +outside the scope of this specification. -The intra-domain isolation of memory between two harts/devices belonging -to the same supervisor domain, but different tenant workloads, is -achieved via the use of MMU, (S)PMP, IOMMU and IOPMP depending on the -type of platform and the type of access. To successfully achieve this -isolation, the page table structures for a domain's workloads must be -managed by the Supervisor Domain Security Manager (SDSM) and the paging -structures must be located in memory exclusively-accessible only to the -Supervisor Domain. Additional security properties may be enforced based -on type (data fetch, instruction fetch, etc.) and locality (hart -supervisor domain identifier) of memory accesses as required for the -security policy specific to usages. An example policy may be to require -certain accesses to target only exclusively-owned domain memory. The MTT -checker may utilize the supervisor domain identifier or additional metadata -for the access to enforce such policies. The description of different types -of Supervisor Domain policies possible is outside the scope of this document. +* `Smsqosid` and CBQRI for Supervisor Domains (<>) - This extension +provides an interface for the RDSM to enforce that resource accesses from a +supervisor domain or the RDSM must not be observable by entities that are not +within their TCB using the resource usage monitors. Similarly, the resource +allocations for a supervisor domain or the RDSM must not be influenced by +entities outside their TCB. -Additional protection/isolation for memory associated with a supervisor domain -is orthogonal (and usage-specific). Such additional protection for memory may -be derived by the use of cryptography and/or access-control mechanisms. The -mechanisms chosen for these additional protection methods are independent of -Smmtt and may be platform-specific. The TCB of a particular supervisor domain -(and devices that are bound to it) may be independently evaluated via -attestation of the HW and SW TCB by a relying party using standard Public-Key -Infrastructure-based mechanisms. diff --git a/chapter3.adoc b/chapter3.adoc index f98a3d2..8cd675d 100644 --- a/chapter3.adoc +++ b/chapter3.adoc @@ -2,6 +2,12 @@ [[Smsdid]] == `Smsdid`: Supervisor Domain Identifier and Protection Register +`Smsdid` defines an interface to program the active supervisor domain +under which a hart is operating. The interface consists of M-mode CSRs `msdcfg` +and `mttp`. The SDID programmed via this interface is a local identifier for the +hart and may be used to tag hart-local resources to access-control data +associated with the supervisor domain. + The `mttp` register is an `XLEN`-bit read/write register, formatted as shown in <> for `XLEN=32` and <> for `XLEN=64`, which controls physical address protection for supervisor domains. This register holds the @@ -45,11 +51,11 @@ an illegal instruction exception. <> shows the encodings of the `MODE` field when `XLEN=64`. When `mttp` `MODE=Bare`, supervisor physical addresses have no MTT-based protection across supervisor domains beyond the physical memory protection scheme described in -Section 3.7 of the RISC-V privileged architecture specification [1]. In this -case, the remaining fields (`SDID`, `MTTPPN`) in `mttp` must be set to zeros, -else generate a fault. When `XLEN=32`, the other valid settings for `MODE` are -`Smmtt34` and `Smmtt34rw`, to support allow/disallow and read-write access -permissions for 34-bit system physical addresses. +Section 3.7 of the RISC-V privileged architecture specification cite:[ISA]. In +this case, the remaining fields (`SDID`, `MTTPPN`) in `mttp` must be set to +zeros, else generate a fault. When `XLEN=32`, the other valid settings for +`MODE` are `Smmtt34` and `Smmtt34rw`, to support allow/disallow and read-write +access permissions for 34-bit system physical addresses. When `XLEN=64`, other than `BARE`, the other valid settings for `MODE` are `Smmtt[46, 56][rw]` to support read-write/access permissions for 46-bit and @@ -62,7 +68,9 @@ may define different interpretations of the other fields in `mttp`. [width="100%",cols="10%,14%,76%", options="header", id=mtt-32] |=== |Value |Name |Description -|0 |`Bare` |No inter-supervisor domain protection +|0 |`Bare` | No supervisor domain protection across beyond the physical memory +protection scheme described in Section 3.7 of the RISC-V privileged architecture +specification cite:[ISA] |1 |`Smmtt34` |Page-based supervisor domain protection for 34 bit physical addresses with access allowed/disallowed per page @@ -77,7 +85,9 @@ physical addresses with RW permissions per page [width="100%",cols="10%,14%,76%", options="header", id=mtt-64] |=== |Value |Name |Description -|0 |`Bare` |No inter-supervisor domain protection +|0 |`Bare` | No supervisor domain protection across beyond the physical memory +protection scheme described in Section 3.7 of the RISC-V privileged architecture +specification cite:[ISA] |1 |`Smmtt46` |Page-based supervisor domain protection for 46 bit physical addresses @@ -117,13 +127,6 @@ least-significant bits of `SDID` are implemented first: that is, if `SDIDLEN` > The `mttp` register is considered active for the purposes of the physical address protection algorithm unless the effective privilege mode is `M`. -Physical accesses that began while `mttp` was active are not required to -complete or terminate when `mttp` is no longer active, unless an `FENCE.MTT` -instruction matches the `SDID` (and optionally, `PA`) is executed. The -`FENCE.MTT` instruction must be used to ensure that updates to the `MTT` data -structures are observed by subsequent implicit reads to those structures by a -hart. - Note that writing `mttp` does not imply any ordering constraints between `S-mode` and `G-stage` page-table updates and subsequent address translations. If a supervisor domain's `MTT` structure has been modified, or if a `SDID` is @@ -140,9 +143,9 @@ configuration for supervisor domains: . `Smsdia` uses `msdcfg.SDICN` to specify the active configuration for the supervisor domain interrupt controller associated with the hart. -. `Smsdedbg` specifies the `msdcfg.sdedbgalw` bit to manage +. `Smsdedbg` specifies the `msdcfg.sdedbgalw` bit to manage external-debug for a supervisor domain. -. `Smsdetrc` specifies the `msdcfg.sdetrcalw` bit to manage +. `Smsdetrc` specifies the `msdcfg.sdetrcalw` bit to manage external-trace for a supervisor domain. . `Smqosid` specifies the control bits `SSM`, `SRL`, `SML` and `SQRID` to enable the RDSM to manage QoS controls for supervisor domains. @@ -150,10 +153,10 @@ configuration for supervisor domains: Details of `Smsdia`, `Smsdedbg`, `Smsdetrc` and `Smqosid` are described in their respective sections in this specification. -[[MSDCFG]] -.`msdcfg` register - -[wavedrom, , ] +[caption="Register {counter:rimage}: ", reftext="Register {rimage}"] +[title="`msdcfg` register"] +[id=MSDCFG] +[wavedrom, ,svg] .... {reg: [ {bits: 6, name: 'SDICN'}, @@ -166,3 +169,85 @@ respective sections in this specification. {bits: 4, name: 'SQRID'}, ], config:{lanes: 4, hspace:1024}} .... + +=== M-mode Supervisor Domain Fence Instruction + + +[caption="Figure {counter:image}: ", reftext="Figure {image}"] +[title="MFENCE.SPA instruction"] +[id=mfence-spa] +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode (SYSTEM)'}, + {bits: 5, name: 'rd (0)'}, + {bits: 3, name: 'func3 (PRIV)'}, + {bits: 5, name: 'rs1 (PADDR)'}, + {bits: 5, name: 'rs2 (SDID'}, + {bits: 7, name: 'func7 (MFENCE.SPA)'}, +], config:{lanes: 1, hspace:1024}} +.... + +The `MFENCE.SPA` fence instruction is used to synchronize updates to supervisor +domain access-permissions with current execution. +`MFENCE.SPA` is only valid in M-mode. If operand rs1 is not equal to x0, it +specifies a single physical address, and if rs2 is not equal to 0, it specifies +a single SDID. Executing a `MFENCE.SPA` guarantees that any previous stores +already visible to the current hart are ordered before all implicit reads by +that hart done for supervisor domain access-permission structures for +non-M-mode instructions that follow the `MFENCE.SPA`. + +When SDID is specified in rs2, bits XLEN-1:SDIDMAX held in rs2 are reserved for +future standard use. Until their use is specified, they should be zeroed by +software and ignored by implementations. Also, if SDIDLEN < SDIDMAX, the +implementation shall ignore bits SDIDMAX-1:SDIDLEN of the value held in rs2. + +[NOTE] +==== +A simpler implementation of MFENCE.SPA may ignore the physical address in rs1, +and/or the SDID value in rs2, and always perform a global fence for all SDs. +==== + +=== M-mode Supervisor Domain Fine-Grain Invalidation Instruction + +In some high-performance implementations, a finer-granular invalidation and +fencing is required that allows for synchronization operations to be more +efficiently batched. When `Sinval` is implemented with `Smsdid`, the +`MINVAL.SPA` instruction must be implemented to support such fine-granular +invalidation of physical memory access-permission caches. + +[caption="Figure {counter:image}: ", reftext="Figure {image}"] +[title="MINVAL.SPA instruction"] +[id=minval-spa] +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode (SYSTEM)'}, + {bits: 5, name: 'rd (0)'}, + {bits: 3, name: 'func3 (PRIV)'}, + {bits: 5, name: 'rs1 (PADDR)'}, + {bits: 5, name: 'rs2 (SDID'}, + {bits: 7, name: 'func7 (MINVAL.SPA)'}, +], config:{lanes: 1, hspace:1024}} +.... + +`MINVAL.SPA` is only ordered against `SFENCE.W.INVAL` and `SFENCE.INVAL.IR` +instructions. As part of the update to the SD access-permissions, the RDSM must +ensure that it uses `SFENCE.W.INVAL` to guarantee that any previous stores to +structures that hold supervisor domain access-permissions (e.g. `MTT`) are made +visible before invoking the `MINVAL.SPA`. The RDSM must then use +`SFENCE.INVAL.IR` to guarantee that all subsequent implicit references to +supervisor domain access-permission structures (e.g. `MTT`) are ordered to be +after the SD access-permissions cache invalidation. When executed in order (but +not necessarily consecutively) by a single hart, the sequence `SFENCE.W.INVAL`, +`MINVAL.SPA` and `SFENCE.INVAL.IR` has the same effect as a hypothetical +`MFENCE.SPA` in which: + +* the values of rs1 and rs2 for the `MFENCE.SPA` are the same as those used in +the `MINVAL.SPA`, +* reads and writes prior to the `SFENCE.W.INVAL` are considered to be those +prior to the `MINVAL.SPA`, and +* reads and writes following the `SFENCE.INVAL.IR` are considered to be those +subsequent to the `MFENCE.SPA` + +`MINVAL.SPA` is only valid in M-mode. diff --git a/chapter4.adoc b/chapter4.adoc index c7dd0e8..24935c2 100644 --- a/chapter4.adoc +++ b/chapter4.adoc @@ -221,28 +221,71 @@ follows: ], config:{lanes: 1, hspace:1024}} .... -=== Caching - -Implementations with virtual memory are permitted to cache translations and -permissions in address translation cache structures. Similarly, access -permissions from the `MTT` lookup may be cached. The `PMP` and `MTT` settings -for the resulting physical address may be checked (and possibly cached) at any -point between the address translation and the explicit memory access. If -caching is occuring, when the `MTT` settings are modified, `M-mode` software -must synchronize the cached `MTT` state with the virtual memory system and any -`PMP`, `MTT` or address-translation caches. This is accomplished by executing -an `SFENCE.VMA` instruction with `rs1=x0` and `rs2=x0`, or `HFENCE.GVMA` as -needed, after the `MTT` is modified. If page-based virtual memory is not -implemented, memory accesses check the `PMP` settings synchronously, but may -check `MTT` settings that are cached, so a `MTT` invalidation (`MTTINVAL`) -instruction is needed. When Svinval is implemented, `MTTINVAL` is only ordered -against `SFENCE.W.INVAL` and `SFENCE.INVAL.IR` instructions. As part of the -`MTT` update, the RDSM must ensure that it uses `SFENCE.W.INVAL` to guarantee -that any previous stores to `MTT` are made visible before invoking the -`MTTINVAL`. The RDSM must then use `SFENCE.INVAL.IR` to guarantee that all -subsequent implicit references to `MTT` are ordered to be after the `MTT` cache -invalidation. - -_[TBD - register interface for flushing all MTT cached entries, vs specific -physical address at page size granularity]._ - +=== Access Enforcement and Fault Reporting + +As shown in <>, MTT lookup composes with, but does not require, +page-based virtual memory (MMU, IOMMU) and physical memory protection mechanisms +(PMP, Smepmp, IOPMP). When paging is enabled, instructions that access virtual +memory may result in multiple physical-memory accesses, including (implicit +S-mode) accesses to the page tables. MTT checks also apply to these implicit +S-mode accesses - those accesses will be treated as reads for translation and as +writes when A/D bits are updated in page table entries when `Svadu` is +implemented. + +MTT is checked for all accesses to physical memory, unless the effective privilege +mode is M, including accesses that have undergone virtual to physical memory +translation, but excluding MTT structure accesses. Data accesses in M-mode +when the MPRV bit in mstatus is set and the MPP field in mstatus contains S +or U are subject to MTT checks. MTT structure accesses are to be treated +as implicit M-mode accesses and are subject to PMP/Smepmp and +IOPMP checks. The MTT checker indexes the MTT using the +physical address of the access to lookup and enforce the access permissions. +A mismatch of the access type and the access permissions specified in the +MTT entry that applies to the accessed region is reported as a trap to the +RDSM which may report it to a supervisor domain. To enable composing +with Sv modes, the MTT supports configuration at supported architectural +page sizes. MTT violations manifest as instruction, load, or store access-fault +exceptions. The exception conditions for MTT are checked when the access +to memory is performed. + +=== Caching of MTT and Supervisor Domain Fence Instruction + +<> describes the canonical behavior of the `MFENCE.SPA` instruction +to invalidate cached access-permissions for all supervisor domains, a specific +supervisor domain, or a specific physical address for a supervisor domain. + +<> implemented with `Sinval` describes a finer granular invalidation +of access-permission caches. + +When `Smmtt` is implemented, an `MTT` structure is used to specify +access-permissions for physical memory for a supervisor domain, the `MTT` +settings for the resulting physical address (after any address translation) may +be checked (and possibly cached) at any point between the address translation +and the explicit memory access. If caching is occuring, when the `MTT` settings +are modified, `M-mode` software must synchronize the cached `MTT` state with the +virtual memory system and any `PMP`, `MTT` or address-translation caches, as +described via <> or in a batched manner via <>. + +When used with the `MTT`, the `MFENCE.SPA` is used to synchronize updates to +in-memory MTT structures with current execution. `MFENCE.SPA` in this case, +applies only to the memory tracking table data structures controlled by the +CSR `mttp`. Executing a `MFENCE.SPA` guarantees that any previous stores already +visible to the current hart are ordered before all implicit reads by that hart +done for the `MTT` for non-M-mode instructions that follow the `MFENCE.SPA`. + +When `MINVAL.SPA` is used, access-permission cache synchronization may be +batch optimized via the use of the sequence `SFENCE.W.INVAL`, `MINVAL.SPA` and +`SFENCE.INVAL.IR`. + +[NOTE] +==== +MTT lookups that began while `mttp` was active are not required to complete or +terminate when `mttp` is no longer active, unless a `MFENCE.SPA` instruction +matches the `SDID` (and optionally, `PADDR`) is executed. The `MFENCE.SPA` +instruction must be used to ensure that updates to the `MTT` data structures are +observed by subsequent implicit reads to those structures by a hart. +==== + +If `mttp.MODE` is changed for a given SDID, a `MFENCE.SPA` with rs1=x0 and rs2 +set either to x0 or the given SDID, must be executed to order subsequent PA +access checks with the `MODE` change, even if the old or new `MODE` is `Bare`. diff --git a/chapter8.adoc b/chapter8.adoc index 5c375ad..b35dad0 100644 --- a/chapter8.adoc +++ b/chapter8.adoc @@ -1,5 +1,6 @@ [[chapter8]] [[Smsdedbg]] +[[Smsdetrc]] == Supervisor Domain External Trace and Debug This chapter describes two extensions `Smsdedbg` and `Smsdetrc` that enable a @@ -42,7 +43,8 @@ configuration held in `msdcfg.sdedbgalw`, as described below: When `msdcfg.sdedbgalw` is 0: -* Access by external debuggers to the memory and/or state of the supervisor domain is disallowed. +* Access by external debuggers to the memory and/or state of the supervisor + domain is disallowed. * Entry to Debug Mode from a supervisor domain is disallowed. @@ -51,8 +53,8 @@ When `msdcfg.sdedbgalw` = 1 then external debug of privilege modes less than with the additional requirements listed below. + . External debug must be able to access supervisor domain memory and/or state. - In this context, "state" includes all supervisor domain resources accessible per the - Debug specification cite:[ExtDbg]. + In this context, "state" includes all supervisor domain resources accessible + per the Debug specification cite:[ExtDbg]. . Entry to Debug Mode from a supervisor domain is allowed. To enforce the above controls specified by this extension, the following @@ -75,7 +77,8 @@ When M-mode external trace is disabled, whether execution at privilege modes less than `M-mode` may be traced by an external trace tool depends on the configuration held in `msdcfg.sdetrcalw`, as described below: -When `msdcfg.sdetrcalw` = 0, external trace of the supervisor domain is disallowed. +When `msdcfg.sdetrcalw` = 0, external trace of the supervisor domain is +disallowed. When `msdcfg.sdetrcalw` = 1 then external trace of privilege modes less than `M-mode` shall be allowed for the SD on a per hart basis, with the diff --git a/glossary.adoc b/glossary.adoc index 544428f..60ad7e9 100644 --- a/glossary.adoc +++ b/glossary.adoc @@ -1,39 +1,43 @@ [[glossary]] == Glossary +[cols="1,4"] |=== -| AIA | RISC-V Advanced Interrupt Architecture (AIA) cite:[AIA] -interrupts. - | ABI | Application binary interface (ABI). +| AIA | RISC-V Advanced Interrupt Architecture (AIA) cite:[AIA] interrupts. + | AP | Application processors (AP)s can support commodity operating systems, - hypervisors/VMMs and applications software workloads. The AP subsystem - may contain several processing units, on-chip caches, and other controllers +hypervisors/VMMs and applications software workloads. The AP subsystem +may contain several processing units, on-chip caches, and other controllers for interfacing with memory, accelerators, and other fixed-function logic. Multiple APs may be used within a logical system. | Attestation | The process by which a relying party can assess the -trustworthiness of the confidential computing environment based on verifying a set of -evidences that are cryptographically-protected by hardware root-of-trust. - -| Confidential Computing | A computing paradigm that protects data in use by -performing computation in a hardware-based, attested TEE. - -| CoVE | Confidential VM extension (CoVE) is the set of RISC-V ABI extensions -defined in cite:[CoVE] that enables confidential computing on RISC-V -platforms. In some deployment models, the CoVE ABI leverages the RISC-V ISA -extensions specified in the RISC-V Supervisor Domains specification. - -| Host supervisor domain | All host software elements including OS and type-1 or -type-2 VMM and hosted VMs operate in a hosting supervisor domain. The hosting -supervisor domain hosts multiple distrusting supervisor domains, that may each -host their own software and applications. - -| Hypervisor | is software running in HS-mode that manages virtual machines -(VMs) by virtualizing hart, guest physical memory and input/output (IO) -resources. +trustworthiness of the confidential computing environment based on verifying a +set of evidences that are cryptographically endorsed by a hardware +root-of-trust. + +| Confidential Computing | A computing paradigm that protects data-in-use by +performing computation in a hardware-based, attested, execution environment. + +| CoVE | **Co**nfidential **V**M **E**xtension is the set of RISC-V ABI +extensions defined in cite:[CoVE] that enables confidential computing for +hardware virtual machines (VMs) on RISC-V platforms. + +| HW RoT | Hardware (HW) Root of trust (RoT) is the isolated hardware/software +subsystem with an immutable ROM firmware and isolated compute and memory +elements that form the Trusted Compute Base (TCB) of a TEE system. The RoT +manages cryptographic keys and other security critical functions such as system +lifecycle and debug authorization. The RoT provides trusted services to other +software, for which it is the TCB, such as verified boot, key management, +security lifecycle management, sealed storage, device management, crypto +services,attestation etc. The RoT may be an integrated or discrete element, and +may be used to manage device identies for attestation. + +| Hypervisor | Software running in HS-mode that manages virtual machines (VMs) +by virtualizing hart, guest physical memory and input/output (IO) resources. | IMSIC | Incoming Message-signaled Interrupt Controller (IMSIC). @@ -46,29 +50,15 @@ resources. | Relying party | An entity that An entity that uses the attestation process to assesses the trustworthiness of an attester. -| Supervisor Domains | This is a RISC-V privileged architecture -extension, define in this specification, to support physical address -space (memory and devices) isolation for -more than one supervisor domain. Supervisor domains enable the reduction of the -supervisor Trusted Computing Base (TCB), with differentiated access to memory and -other platform resources. - -| HW RoT | Hardware Root of trust (RoT) is the isolated hardware/software subsystem with an -immutable ROM firmware and isolated compute and memory elements that form the -Trusted Compute Base (TCB) of a TEE system. The RoT manages cryptographic keys -and other security critical functions such as system lifecycle and debug -authorization. The RoT provides trusted services to other software, -for which it is the TCB, on the -platform such as verified boot, key provisioning, and management, security -lifecycle management, sealed storage, device management, crypto services, -attestation etc. The RoT may be an integrated or discrete element, and may be -used to manage device identies for attestation. - -| Tenant workload | All software elements owned and deployed by a single -organization that may be hosted by a platform operator e.g. cloud provider -on a platform that can host more than one organizations workload simultaneously. -For example, in a virtualizated environment, the tenant workload elements may -include VS-mode guest kernel and VU-mode guest user-space applications. +| Supervisor Domain (SD) | A RISC-V privileged architecture extension defined in +this specification, to support isolation across more than one supervisor +execution context. Supervisor domains enable the reduction of the supervisor +Trusted Computing Base (TCB), with differentiated access to memory and other +platform resources. All host software elements including OS and type-1 or +type-2 VMM and hosted VMs operate in a "hosting" supervisor domain. The hosting +supervisor domain may interact with multiple distrusting supervisor domains via +the support of a root domain security manager. The alternate supervisor domains +may each host their own software and applications. | TCB; Also, System/Platform TCB | Trusted computing base (TCB) is the hardware, software, and firmware elements that are trusted by a relying party to protect @@ -77,11 +67,18 @@ execution against a defined adversary model. In a system with separate processing elements within a package on a socket, the TCB boundary is the package. In a multi-socket system the Hardware TCB extends across the socket-to-socket interface, and is managed as one system TCB. The software TCB -may also extends across multiple sockets. +may also extends across multiple sockets. | TEE | Trusted execution environment (TEE) is a set of hardware and software mechanisms that allow creating attestable and isolated execution environment. +| Tenant workload | All software elements owned and deployed by a single +organization that may be hosted by a platform operator e.g. cloud provider +on a platform that can host more than one organizations workload simultaneously. +For example, in a virtualized environment, the tenant workload elements may +include VS-mode guest kernel and VU-mode guest user-space applications. Tenant +workloads may also operate in the context of one of more supervisor domains. + | VM | An efficient, isolated duplicate of a real computer system. In this specification it refers to the collection of resources and state that is accessible when a RISC-V hart supporting the hypervisor extension diff --git a/header.adoc b/header.adoc index 2a48f35..69f5bde 100644 --- a/header.adoc +++ b/header.adoc @@ -1,8 +1,8 @@ [[header]] :description: RISC-V Supervisor Domains Access Protection :company: RISC-V.org -:revdate: 4/2024 -:revnumber: 1.0.81 +:revdate: 5/13/2024 +:revnumber: 1.0.82 :revremark: This document is in development. Assume everything can change. See http://riscv.org/spec-state for details. :url-riscv: http://riscv.org :doctype: book @@ -60,7 +60,9 @@ Copyright 2024 by RISC-V International. [preface] include::contributors.adoc[] +[preface] include::glossary.adoc[] +:!chapter-signifier: include::intro.adoc[] include::chapter2.adoc[] include::chapter3.adoc[] diff --git a/images/fig2.png b/images/fig2.png index 09f845f..e3a2810 100644 Binary files a/images/fig2.png and b/images/fig2.png differ diff --git a/intro.adoc b/intro.adoc index 1dadcd5..d92abd8 100644 --- a/intro.adoc +++ b/intro.adoc @@ -2,6 +2,8 @@ == Introduction +=== Motivation and Goals + RISC-V privileged architecture cite:[ISA] defines execution mode for supervisor software called S-mode. S-mode software may optionally enable Hypervisor extension to host virtual machines. Typically, there is a single supervisor @@ -11,8 +13,8 @@ extension to support physical address space (memory and devices) isolation for more than one supervisor domain. Supervisor domains enable trusted execution use cases for RISC-V platforms. Supervisor domains may also be used to reduce the supervisor Trusted Computing Base (TCB), with differential access to memory -and other platform resources e.g. in Confidential VM Extension (CoVE), TEE -Security Services, Secure Devices etc. +and other platform resources e.g. in Confidential Computing, TEE Security +Services, Secure Devices etc. Tenant (application or VM) workloads on multi-tenant platforms rely on hardware-based isolation primitives that are managed by the host/privileged @@ -38,14 +40,16 @@ image::images/fig1.png[] A supervisor domain is associated with a set of physical address regions that are isolated from other supervisor domains on the same platform, with only the Root Domain Security Manager (RDSM) with access to all of the physical address -space. A supervisor domain identifier (SDID) is associated with the supervisor -domain to facilitate physical address protection fences on a per supervisor -domain basis. Supervisor domains must rely on a TCB which consists of the RDSM -(software) and hardware (hart, SoC, Root-of-trust) that enforces the isolation -properties for the supervisor domain. Isolation of the workloads within a -supervisor domain is the responsibility of the OS/hypervisor managing the -supervisor domain, here referred to as the Supervisor Domain Security Manager -(SDSM). +space. A supervisor domain identifier (SDID) is associated with the hart +operating in the context of a supervisor domain to facilitate physical address +protection fences on a per supervisor domain basis. Supervisor domains must rely +on a TCB which consists of the RDSM (software) and hardware (hart, SoC, +Root-of-trust) that enforces the isolation properties for the supervisor domain. +The RDSM may utilize PMP/ `Smepmp` and/or the `Smmtt` (Memory Tracking Table) +extension described in this specification to isolate physical memory between +supervisor domains. Isolation of the workloads within a supervisor domain is the +responsibility of the OS/hypervisor managing the supervisor domain, here +referred to as the Supervisor Domain Security Manager (SDSM). A key goal of using multiple domains is to be able to reduce the common TCB across domains, and should enable the attestation cite:[CCC] of each domain @@ -69,11 +73,130 @@ with attestation of the TCB. assign resources to other domains. * A service-provider domain that has exclusive access to some devices. -In order to avoid re-factoring of deployed host software, workloads and -applications, new hardware primitives are required to support memory isolation -for domains. A second key requirement the new hardware primitives must address -is the performance and scalability of physical memory isolation at a page-level -to support rich-OS memory management models. This specification describes the -architecture primitives to support the requirements of a multi-supervisor -domain physical address isolation model via a Supervisor Domain Access -Protection (Smmtt) extension for RISC-V processor-based platforms. +In order to avoid re-factoring of deployed software, workloads and +applications, new hardware primitives are required to support flexible isolation +of data in caches and memory. The new primitives are also required to isolate +resources such as interrupts, IO, QoS mechanisms and debug/trace mechanisms for +robust isolation of supervisor domains. The hardware primitives must support +performant and scalable physical memory isolation at a page-level to support +rich-OS memory management models. This specification describes the set of +architecture extensions to support the requirements for supervisor domain +isolation for RISC-V processor-based platforms. + +=== Memory Isolation - Theory of operation (informative) + +Supervisor Domain Access Protection extensions are used by M-mode RDSM to +program access policies for supervisor domain operation. The `Smmtt` extension +enables the RDSM to program permissions for physically-addressed memory (or +device-mapped regions) by a hart/device operating within a supervisor domain. +Associating a hart/device with a supervisor domain implies that any +physical-addressable region access occurring in the context of the supervisor +domain is subject to access-checks for that domain. Hence, software or hardware +accesses that originate from supervisor domains other than the allowed +supervisor domain can be explicitly prevented/allowed. The RDSM has access to +physical memory for all supervisor domains. In typical security usages, write +accesses to the MTT structures must be restricted and managed by the RDSM. + +Memory regions may be accessed by harts or by other devices on the platform. +When harts and devices are assigned to a supervisor domain, the hart/device is +said to perform memory accesses in the context of that supervisor domain. For +all accesses using a physical address, the SDID is the supervisor domain +identifier programmed into a CSR. This CSR is programmed on the hart by the +Root Domain Security Manager (RDSM). The assignment of the hart/device to a +supervisor domain may be static (e.g. device assignment to a VM) or dynamic +(e.g. scheduling a VM virtual cpu within a domain). The MTT for the supervisor +domain active on the hart is programmed on the hart along with the supervisor +domain identifier. The MTT does not perform any address translation; it simply +provides access permissions for the physically addressed region/page (post any +S-mode and/or G-stage address translation) to enforce the isolation properties +per the use case requirements (see <>). + +[caption="Figure {counter:image}: ", reftext="Figure {image}"] +[title= "MTT lookup for Supervisor Domain Access", id=mtt-lookup] +image::images/fig2.png[] + +The MTT checker is a functional block that looks up the `MTT` using the physical +address of the access as an index to retrieve the access permissions for the +supervisor domain. This checker thus enforces that for a load initiated by the +hart, the physical address is readable, and for a store initiated by the hart, +the physical address is also writable, else reports a fault. An MTT access +violation is always reported as a trap to the `M-mode` RDSM. The MTT checker may +be implemented as an MMU extension in the hart, and/or in the IO interconnect to +check device accesses. The MTT checker is designed to work together with the +page-based virtual memory (MMU, IOMMU) systems and Physical Memory Protection +(`PMP`, `Smepmp`, `IOPMP`) mechanisms. Read and Write permissions for memory are +derived from the page table, the `PMP` and the `MTT` - an access is allowed only +when all protection mechanisms allow the access. + +MTT may be used to provide permissions for physical memory addresses +that hold regular main memory or IO memory. Memory may be assigned to +the RDSM to bootstrap the subsequent run-time lookup structures for MTT. +All memory should be covered by the MTT, though some memory may not be +eligible to be qualified for assignment to a specific supervisor domain. +This limitation may arise due to platform configuration and security +policies - for example, if the platform security policy requires memory +for a domain to be encrypted and some memory access paths are not +enforced via an inline memory encryption engine. It is expected that the +RDSM can use trusted platform-specific methods to enumerate which +regions can be designated as access-controlled via the MTT. + +MTT must support both static and run-time configurability. A memory +region (consisting of one or more pages) may be (re)assigned from one +domain to another at run-time e.g. this is done by revoking the +permission for one domain and assigning permissions to another domain. +Run-time configuration may be performed via M-mode CSRs and/or in-memory +structures. The in-memory structures used for MTT must themselves be +access-limited to the RDSM by use of the MTT structures to disallow any +supervisor domain from accessing the structures unless explicitly +delegated by the Root Domain Security Manager (RDSM) to a particular +domain (per use case policies). To support MTT dynamic reconfiguration, +an interface is expected to be provided to set the attributes by passing +requests to a trusted driver (in the RDSM) that can reconfigure the +memory region assignment. Converting memory regions assignment from one +domain to another might involve platform-specific operations based on +the enforcement mechanism, such as TLB/cache flushes, that must be +enforced by the RDSM and hardware. The RDSM is expected to change the +settings and flush caches if necessary, so the system is only incoherent +during the transition between domain assignment settings. This +transitory state should not be visible to lower privilege levels (i.e. +supervisor domains). There are also security aspects to be considered during +(re)configuration, e.g., clearing memory used by the current SD before +assigning it to another SD. Refer to the RISC-V CoVE cite:[CoVE] ABI and threat +model as a reference. + +A hart/device may perform accesses to memory exclusively accessible to its +supervisor domain, or to memory shared globally with one or more supervisor +domains. Memory sharing between supervisor domains is achieved by simply making +the physical memory region accessible to the supervisor domains via the MTT +structure associated with the hart or device. Access to physical addresses +initiated from a hart or a device assigned a supervisor domain identifier may be +denied by virtue of the permissions in the MTT lookup - such disallowed accesses +from a hart cause a trap to the RDSM to report a fault. In the case of a device +access disallowed by the MTT, the IO sub-system may log an error for the RDSM +which may delegeate it to a supervisor domain. + +The intra-domain isolation of memory between two harts/devices belonging +to the same supervisor domain, but different tenant workloads, may be +achieved via the use of MMU, PMP/Smepmp, SPMP, IOMMU and IOPMP depending on the +type of platform and the type of access. To successfully achieve this +isolation, the page table structures for a domain's workloads must be +managed by the Supervisor Domain Security Manager (SDSM) and the paging +structures must be located in memory exclusively-accessible only to the +supervisor domain. Additional security properties may be enforced based +on type (data fetch, instruction fetch, etc.) and locality (hart +supervisor domain identifier) of memory accesses as required for the +security policy specific to usages. An example policy may be to require +certain accesses to target only exclusively-owned domain memory. The MTT +checker may utilize the supervisor domain identifier or additional metadata +for the access to enforce such policies. The description of different types +of supervisor domain policies possible is outside the scope of this document. + +Additional protection/isolation for memory associated with a supervisor domain +is orthogonal (and usage-specific). Such additional protection for memory may +be derived by the use of cryptography and/or access-control mechanisms. The +mechanisms chosen for these additional protection methods are independent of +Smmtt and may be platform-specific. The TCB of a particular supervisor domain +(and devices that are bound to it) may be independently evaluated via +attestation of the HW and SW TCB by a relying party using standard Public-Key +Infrastructure-based mechanisms. +