Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow exploded package descriptor as alternative to resource descriptor URI #1017

Open
laurentsimon opened this issue Jan 19, 2024 · 3 comments

Comments

@laurentsimon
Copy link
Contributor

laurentsimon commented Jan 19, 2024

The (intoto) resource descriptor in the SLA provenance has a URI field which is required unless content / hash is present

https://github.com/in-toto/attestation/blob/main/spec/v1/resource_descriptor.md
https://github.com/in-toto/attestation/blob/main/spec/v1/field_types.md#ResourceURI

I'd like to propose an alternative based on my experience using package URLs.

Most use cases I've encountered requires doing partial matching on URIs (package name, version, registry, something else). But URIs / URLs force us to squeeze all that into a single field, and we're forced to parse / explode each component to use them. Take the example of search in a database: we always want to index on individual components. Attestations is another example: we typically want to let users match on package name, and optionally on versions.

One scenario where a URL is useful is when humans need to write, copy and interpret them. So web URLs make sense to me. End-users do not manipulate package URLs / URIs: they interact with a package manager. So it in my experience, I've found package URLs do not provide a useful abstraction in practice. I have personally struggled to use them - another example is a container PURL image@version: it is not compatible with how we pull container images from OIC registry image:version.

I wonder if there would be interest in defining a "package descriptor", essentially a hash map with predefined fields, something to the effect of:

type PackageDescriptor struct {
        // Ecosystem, as defined by OSV schema https://google.github.io/osv.dev/data/#covered-ecosystems
        Ecosystem string `json:"name,omitempty"` 
	// Package name.
	Name string `json:"name,omitempty"`
	// Package registry.
	Registry string `json:"registry,omitempty"`
	// Package version.
	Version string `json:"version,omitempty"`
	// Package architecture.
	Arch string `json:"arch,omitempty"`
	// The package target distro.
	Distro string `json:"distro,omitempty"`
	// Package environment (debug, prod, etc).
	Environment string `json:"environment,omitempty"`
	// We may define this structure as simply a map[string]string.
}

Given a package descriptor, anyone may decide which components to use for their use cases. Example: if you want to index on package ownership, we can use registry + package name as the 'index".

If one needs to convert these into a unique "hash / index" for lookup and indexing, they can serialize them (e.g., sort the fields and use a delimiter). If we want to make them "developer friendly" to help with debugging, serialize only (not hash). It's easy to define how to do that for a package descriptor. Defining serialization once seems simpler than parsing and defining URL semantics (for each ecosystem? package-url/purl-spec#281, package-url/purl-spec#279).

Is this view controversial? Where do we need the semantics of a URL in practice?

@woodruffw you may have a set of arguments in favor of URLs (I saw you proposed using them in https://github.com/trailofbits/homebrew-attestation/tree/main/specs/publish/v0.1). Would love to hear your point of view

@mihaimaruseac
Copy link

We are doing something similar in GUAC since pURLs are not always descriptive enough

@woodruffw
Copy link

@woodruffw you may have a set of arguments in favor of URLs (I saw you proposed using them in https://github.com/trailofbits/homebrew-attestation/tree/main/specs/publish/v0.1). Would love to hear your point of view

I have no strong "pro" or "con" position when it comes to PURLs; the main reason I used them in that attestation was because they're a known quantity and (in Homebrew's case) I have enough control over the spec itself to enforce a particular "shape" for each URL. As best I can tell, here's the pros/cons for PURLs:

  • Pro: there's a standardization effort behind them
  • Pro: they're (nominally) consumable by an ordinary URL parser
  • Pro: they have accommodations for carrying additional metadata
  • Con: URLs themselves aren't super well specified/existing URL parsers tend to be very permissive with their inputs
  • Con: they're malleable (multiple valid ways to represent a single package), meaning they can't naively be used as indices/keys
  • Con: they don't degrade gracefully (anything that doesn't cleanly fit needs to be percent-encoded, ablating some of the benefits of using a short human friendly string)
  • Con: they're still a work in progress (not every major ecosystem has a stable PURL namespace yet, or even a WIP one)

@mlieberman85
Copy link
Member

I feel like this tradeoffs one standard for just a new standard. I would be a bit worried that a tool now has to support this new descriptor. For example a lot of existing open source and commercial tools support purl including other metadata documents like SBOMs. These tools and metadata documents specifications will require updates to use the new descriptor format. FWIW I've run into the similar issues and have highlighted them here: package-url/purl-spec#242

Separately, if you look at OpenSSF's response to the CISA RFC on Software Identification Ecosystem Analysis, we do some analysis here and I do think we should be consistent - https://openssf.org/blog/2023/12/11/openssf-responds-to-the-cisa-rfc-on-software-identification-ecosystem-analysis/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants