Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify which profiles to include in a Collection element #365

Closed
willarmiros opened this issue Jun 1, 2023 · 11 comments
Closed

Clarify which profiles to include in a Collection element #365

willarmiros opened this issue Jun 1, 2023 · 11 comments
Assignees
Labels
serialization Something about the representation of data in bytes
Milestone

Comments

@willarmiros
Copy link
Contributor

Copying this question to this repo from spdx/tools-python#680 per @armintaenzertng's suggestion:

Which Profile(s) should I use in CreationInfo? If I have a SpdxDocument element, which in turn contains several other elements through relationships, should I list only the Core profile in the SpdxDocument's creationInfo, or should I include all the profiles of its sub-elements?

My interpretation of the current spec for the profile is that I would only include the Core profile. However, if there is only 1 creationInfo in the whole document, would it make sense to include all profiles (e.g. Software, AI, Dataset) of elements contained within?

If this change makes sense, I can make a PR to update the description of the Profile property!

@zvr
Copy link
Member

zvr commented Jun 1, 2023

We discussed this in the serialization meeting today. The general question is: in any Collection element, do we put only the profile for this element (may be only Core), or do we add all the profiles of all elements that are included in the collection?

This is a separate discussion from whether there is one or many creationInfo structures.

One of the serialization examples that we plan to have in order to do tests is exactly this case.

@willarmiros willarmiros changed the title Clarify which profiles to include in CreationInfo Clarify which profiles to include in a Collection element Jun 1, 2023
@willarmiros
Copy link
Contributor Author

Thanks for the follow up @zvr! So to be clear: is the jury still out on what the correct behavior is here? If so we can keep this issue open to track the question, and FWIW I would vote in favor of including all of the profiles of sub-elements instead of just 1.

@goneall
Copy link
Member

goneall commented Jun 2, 2023

FWIW I would vote in favor of including all of the profiles of sub-elements instead of just 1

Me too

@goneall goneall added this to the 3.0-rc2 milestone Jun 3, 2023
@goneall goneall added the serialization Something about the representation of data in bytes label Jun 3, 2023
@davaya
Copy link
Contributor

davaya commented Jun 21, 2023

I'd vote that every element, whether a collection or not, has the profile that the element is defined in. A Bom element is defined in Core, an Sbom element is defined in Software based on Core properties. It doesn't matter to the Sbom element whether the spdxIds it contains point to Build or Licensing or AI types, the Sbom element itself is understandable simply by knowing [Core, Software].

The other way is like saying an HTML document is more than HTML if it links to TIFF images and PDF files. That way lies madness.

An SBOM element could have a separate "concluded profile" property that summarizes the whole tree below itself, but you get into how many levels deep do you go, and known unknowns, etc. What happens when most of your tree uses core and software, but 7 levels down in the dependency tree there's an AI?

IF causality is enforced (every graph must be built bottom-up so that every collection's members must exist at or before the collection itself can be created), AND the value of every element in a collection, not just its spdxId, must be known to the creator of the collection, AND elements may never be amended, THEN complete knowledge of the graph allows concluded_profile to be computed. But that seems a bit stiff just to label the profile that defines an individual element.

@goneall
Copy link
Member

goneall commented Jul 18, 2023

From the conversation above - it looks like we're going in the direction of each element having the set of supported profiles. This is what the model spec currently states, so if we agree on this, we can just close the issue.

@zvr - do you agree?

@jeff-schutt
Copy link
Collaborator

The following proposal describes a way to use the creationInfo: profile property in a collection element.

This was discussed in today's tech meeting and had general support and agreement.

In short: the collection element (and every subclass) should semantically convey the profiles of all elements known to exist in its collection. This differs from every other SPDX 3.0 element that semantically conveys the profile in which the element was defined.

Scenario 1
Agent A creates a collection of SPDX 3.0 elements representing software that Agent A distributes. The software is distributed as a package containing multiple files.

  • Agent A is represented as an SPDX 3.0 element with "creationInfo": {"profile": {"core"}}.
  • Package X, File Y, File Z, etc. are each represented as individual SPDX 3.0 elements. When Agent A creates these elements each element has its own "creationInfo": {"profile": {"software"}}.
  • Agent A creates an ElementCollection of type SpdxDocument that binds together all the elements described above.
  • The ElementCollection element includes "creationInfo": {"profile": {"core", "software"}} because ElementCollection should semantically convey all types of elements inside it. In this case, the Agent element is part of the Core Profile and the software package and file elements are part of the Software Profile, so the ElementCollection includes both profiles. Note: every SPDX 3.0 element is required to have a ProfileIdentifier listed as part of it's CreationInfo so there should never be a situation where any of the elements in the collection don't conform to at least one profile.

Scenario 2
Agent B receives software and the corresponding SPDX Document from Agent A. Agent B intends to add security vulnerability information as part of Agent B's risk management process when evaluating software received from Agent A.

  • Agent B is represented as an SPDX 3.0 element with "creationInfo": {"profile": {"core"}}.
  • Security vulnerabilities affecting multiple files in the package are each represented as individual SPDX 3.0 elements. When Agent B creates these elements each element has its own "creationInfo": {"profile": {"security"}}.
  • Agent B creates a new ElementCollection element that binds together file elements received from Agent A with newly created vulnerability elements by use of relationships.
  • Note that Agent B has the choice to reference elements from Agent A's collection or to recreate them in Agent B's collection. Assume security vulnerability information changes much more frequently than software composition information and assume Agent B is not modifying Agent A's software. Then the recommended way is to reference elements from Agent A's collection in Agent B's collection. In this case Agent B does not modify any of the elements received from Agent A.
  • If Agent B's new collection recreates Agent A's elements, then the newly created ElementCollection element includes "creationInfo": {"profile": {"core", "software", "security"}} because ElementCollection should semantically convey all types of elements inside it and Agent B has added new security vulnerability elements to the recreated package and file elements.
  • If Agent B's new collection only includes security vulnerabilities (recommended), then the newly created ElementCollection element includes "creationInfo": {"profile": {"core", "security"}} because ElementCollection should semantically convey all types of elements inside it and Agent B has created a collection including security vulnerability elements as well as the element representing Agent B.
  • In either scenario, Agent B creates a new relationship from Agent A's ElementCollection element to Agent B's ElementCollection element to track how the original data received is being enhanced, while maintaining provenance by preserving the original creationInfo of any elements created by Agent A.

Note: other fields, including mandatory fields in SPDX 3.0, were intentionally excluded for illustration purposes.

@goneall
Copy link
Member

goneall commented Jul 18, 2023

Thanks @jeff-schutt - agree with your comment - very clear description of the scenarios.

Just one note/question - I personally think it would also be OK for the Agent creationInfo to include the additional properties the Agent used when that Agent created the Collections. For the purposes of a descriptive example, I do think you should use the more restricted profile of core as you did in the above comment. The reason I bring this up is a common scenario where an Agent is producing a set of elements for a set of profiles - I think it would be OK (and efficient) for all of the element to have the same set of profiles even if they don't really apply to the properties within that element. Let me know if you disagree.

@jeff-schutt
Copy link
Collaborator

IMO having the profiles that the agent used when creating elements associated with the agent via the Agent's creationInfo instead of associated with the collection via the ElementCollection's creationInfo creates another level of indirection, adds complexity and confusion, and makes it harder for users to remember how to interpret the profile(s) listed in any element.

If I came across an agent described as an SPDX 3.0 element with multiple profiles listed within it's creation info, I would anticipate that the Agent can produce SPDX Elements that conform to those profiles, e.g., a SoftwareAgent can create Agent, Package, Vulnerability, and License elements. Because it can, doesn't mean that it always will. By listing one or more profiles on the ElementCollection's creation info, we've semantically conveyed to a user that the collection in question, produced by the associated Agent element, does include elements from all of those profiles.

I think it would be OK (and efficient) for all of the element to have the same set of profiles even if they don't really apply to the properties within that element.

@goneall If you mean "every element has the same set of profiles applied to its profile property, even if some profiles don't really apply to a specific element in the collection" I believe this efficiency would create inaccuracies and unnecessary complexity, causing confusion. If the intent is to reduce duplicate data, I suspect this comment is better discussed in #357 which discusses how to efficiently compact the data in a serialized collection. If the intent is to efficiently label every element with any possible profile, I believe this breaks the 3.0 model: every element should be able to stand on it's own and convey accurate metadata about the object it describes. E.g., an element created for a vulnerability should not have a license profile associated with it.

If you mean "every collection element to have the complete set of profiles for all it's elements even though profiles within various elements of the collection don't apply to other elements in the collection", yes I agree.

To summarize, I propose an SPDX 3.0 restriction where only the Agent and the ElementCollection (or any subclass) can have multiple profiles listed. When ElementCollection lists multiple profiles, we semantically convey the various types of elements contained in the ElementCollection. When Agent lists multiple profiles, we semantically convey the various types of elements the Agent is capable of producing. IMO this should make SPDX 3.0 adoption easier.

@goneall
Copy link
Member

goneall commented Jul 19, 2023

@jeff-schutt - It looks like we are thinking of the list of profiles quite differently. I suspect the disconnect is more related to the semantics of the profile than the serialization. We should probably sync up on one of our calls to align on a common understanding. Once we do that, we definitely should document profiles better.

@goneall
Copy link
Member

goneall commented Jul 29, 2023

After our last tech call discussion on profiles, we came up with the 3 profile definitions:

  • New Definitions:
    * Profile Team - used as areas of interest, organizing into teams.
    * Profile Namespace - grouping together "new" classes, properties & enumerations.
    * Profile Conformance Point - Additional restrictions on the model that state if you claim to support the

I noticed that the definition of the profile property relates to the Profile Namespace. I've been thinking of the property as representing the Profile Conformance Point.

I would propose at a minimum, we split these two uses into two different properties.

In thinking about it, I don't think there is a need for the namespace related profile property. The property itself has the namespace encoded in it - either directly in the case of the RDF property URI or in the context file if using JSON-LD. For other serializations, this static use of profiles can be looked up in the spec itself.

I would propose changing the definition of the profile to be conformance points related.

I'll create a draft proposal PR for consideration.

goneall added a commit that referenced this issue Jul 29, 2023
Fixes #365

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall
Copy link
Member

goneall commented Jul 29, 2023

Created #447 as a proposal

@zvr zvr closed this as completed in 19ab51a Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serialization Something about the representation of data in bytes
Projects
None yet
Development

No branches or pull requests

5 participants