Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image-index: add the subject field #1020

Merged
merged 1 commit into from
Jun 15, 2023
Merged

Conversation

vbatts
Copy link
Member

@vbatts vbatts commented Feb 12, 2023

Fixes: #1002

This allows for an image-index / manifest-list (still the better name than "image-index" ...) to point to the blob of another manifest. This is a part of the referrer API in the distribution-spec.

Signed-off-by: Vincent Batts vbatts@hashbangbash.com

Fixes: opencontainers#1002

This allows for an image-index / manifest-list (_still_ the better name
than "image-index" ...) to point to the blob of another manifest.
This is a part of the referrer API in the distribution-spec.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
@brackendawson
Copy link

Does this not lead to the same logical loop that @sudo-bmitch mentioned here: #1012 (comment)

@sudo-bmitch
Copy link
Contributor

Yes, this makes it possible create an infinite loop in tools I'm working on. I'm recursively processing a tag, all child manifests, and all referrers to those manifests. If an index can have a child manifest and subject manifest, pointing to the same digest or any of its children, this would create a loop.

@vbatts
Copy link
Member Author

vbatts commented Feb 12, 2023 via email

@jonjohnsonjr
Copy link
Contributor

this would create a loop

Can you describe what operation you are trying to perform that would actually encounter such a loop? I am having a hard time seeing it.

@sudo-bmitch
Copy link
Contributor

this would create a loop

Can you describe what operation you are trying to perform that would actually encounter such a loop? I am having a hard time seeing it.

Showing the full tree of a manifest and all child objects of that manifest, recursively. So I can point to a multi platform image and see that it has an SBOM attached to specific platform images, and that SBOM has been signed with a tool like cosign.

@vbatts
Copy link
Member Author

vbatts commented Feb 12, 2023 via email

@sudo-bmitch
Copy link
Contributor

Yes, the most likely cause is a malicious user. The result is that various tooling processing that malicious content would experience a DOS, likely from a stack growing from too much recursion.

Before this PR, OCI has always been a DAG as long as no one could create a hash collision. After this PR, we no longer have a logical DAG (logical in that referrers are treated as child objects of the manifest they refer to). That's going to change design decisions of any tooling written to perform recursive operations.

@vbatts
Copy link
Member Author

vbatts commented Feb 13, 2023

OCI has always been a DAG

fair, but this subject workflow is a new thing to OCI. So it's up to us currently to iron topics like this out. The nature of being able to add a reference to something else introduces this possibility. It seems like something that needs guidance so that tools know what to expect.

Just to complicate this matter, despite the OPTIONAL recommendation an manifest and image-index to use mediaType field to self-identify, as Jon mentioned the norm is to duck type for the fields.
So, the following example is currently both a valid image-index and manifest:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f"
    }
  ],
  "subject": {
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "size": 7143,
    "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f"
  },
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 1470,
    "digest": "sha256:c86f7763873b6c0aae22d963bab59b4f5debbed6685761b5951584f6efb0633b"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 675598,
      "digest": "sha256:9d3dd9504c685a304985025df4ed0283e47ac9ffa9bd0326fddf4d59513f0827"
    }
  ]
}

vbatts added a commit to vbatts/oci-image-spec that referenced this pull request Feb 13, 2023
spurred from opencontainers#1020 (comment)

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
@vbatts
Copy link
Member Author

vbatts commented Feb 13, 2023

@jonjohnsonjr
Copy link
Contributor

(logical in that referrers are treated as child objects of the manifest they refer to)

I think this is where we disagree. Why treat them as child objects?

@sudo-bmitch
Copy link
Contributor

(logical in that referrers are treated as child objects of the manifest they refer to)

I think this is where we disagree. Why treat them as child objects?

For me it's the logical order of walking the graph. I pull the tag, possibly multi-platform, I pull the image with layers, and I pull the SBOM and signature that refer to that those manifests. I don't deploy an SBOM to a Kubernetes cluster and then use that to find the image that needs to be run.

What other methods are there to treat this relationship that would be more appropriate and support user workflows?

@jonjohnsonjr
Copy link
Contributor

I don't deploy an SBOM to a Kubernetes cluster and then use that to find the image that needs to be run.

This seems like a different operation than rendering a tree.

What other methods are there to treat this relationship that would be more appropriate and support user workflows?

I would treat it like the DAG that it is. It really depends on the exact workflow.

For example, rendering a tree isn't really possible because the data structure isn't a tree. You need to render two different trees if you're going to invert some of the edges, unless you're okay with handling "loops".

If you're deploying an image and want to copy everything you need for that image into a different registry, one algorithm would be:

  1. Walk backwards up the graph recursively to enumerate all the referrers, which are your roots.
  2. Walk forwards through all your roots, copying everything that is reachable.

You won't have any loops in step 1 because inverting all the edges of a DAG can't create any loops.
You won't have any loops in step 2 because you're just walking the DAG like normal.

The reason you're getting "loops" is that you're treating the DAG and the inverted DAG like they're the same graph, but they're not. You can't hop between them without causing issues in the same way that you can't hop the median of the highway without causing issues.

@sudo-bmitch
Copy link
Contributor

  1. Walk backwards up the graph recursively to enumerate all the referrers, which are your roots.
  2. Walk forwards through all your roots, copying everything that is reachable.

Recursively walking backwards in 1 has to be done for every node in 2. So the recursive part of 1 can add the same root to the list.

If it's only copying, eventually you'll see the destination already has a given root (though that gets harder if you are repairing a corrupt registry and process each node even when the root already exists). If it's displaying the details of an image and associated metadata, that's not guaranteed to be a tree anymore. If it's a GC algorithm, hopefully it stops marking nodes when it sees they are already marked.

I agree that none of this is impossible or even difficult to handle on the client side. But it is a change in the logic that's been included everywhere else and I'll understand if some don't rewrite their algorithms and just hope they never receive bad data.

@jonjohnsonjr
Copy link
Contributor

Recursively walking backwards in 1 has to be done for every node in 2. So the recursive part of 1 can add the same root to the list.

I don't understand what operation would require doing this. It's possible, sure, but when would you need to do this?

@jonjohnsonjr
Copy link
Contributor

Should index also have an artifactType?

@tianon
Copy link
Member

tianon commented Apr 13, 2023

@jonjohnsonjr would you be willing to write down the use case for this that you shared during the call today? 🙇

(I shared one too, but mine was super contrived and not useful, so I think yours is a lot more helpful as context here.)

@vbatts since you weren't on today's call, perhaps you could also share one or two use cases you see for this? 😇 (or further expand on the rationale for including it 🙏)

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Apr 20, 2023

would you be willing to write down the use case for this that you shared during the call today?

Let's say we have an index where each child has an SBOM pointing to it:

graph TD;

lsbom[sbom]-.->linux;

index-->linux;
index-->windows;

wsbom[sbom]-.->windows;
Loading

It would be nice to be able to attach an SBOM to the index as well that is just an aggregation of all the child SBOMs, like this:

graph TD;

sboms-->lsbom;
sboms-.->index;
sboms-->wsbom;

lsbom[sbom]-.->linux;

index-->linux;
index-->windows;

wsbom[sbom]-.->windows;
Loading

So you can ask the registry for referrers of the index and get an artifact that represents all the sboms for that index.

This is a little nicer than having to walk down then up to collect all the SBOMs for a given tag.

@sudo-bmitch
Copy link
Contributor

It would be nice to be able to attach an SBOM to the index as well that is just an aggregation of all the child SBOMs...
So you can ask the registry for referrers of the index and get an artifact that represents all the sboms for that index.

I'm going to channel my inner @jonjohnsonjr and say I don't like this example. It works for the image producer, but maintainability and extensibility raise a lot of concerns. If I'm adding an updated SBOM to one of the child images, I don't necessarily know all the parent index manifests that reference that image, and can't go back and update their respective index of SBOMs.

I'd prefer for this scenario be solved with an option on the referrers API where registries could return recursive results (and clients would fall back to walking the child manifests if the server doesn't support the option). That allows the registry to ensure the response is always updated with any changes to referrers to child manifests.

What I'm hoping to see is something like a CNAB type of artifact, that is too complex to represent with a single image manifest, and which also has a need to refer to another manifest (rather than including it as one of it's own child manifests).

@jonjohnsonjr
Copy link
Contributor

If I'm adding an updated SBOM to one of the child images, I don't necessarily know all the parent index manifests that reference that image, and can't go back and update their respective index of SBOMs.

Definitely! That's why I expressed that it was important for referrers to support any relationships between objects, not just the subject field.

I'd prefer for this scenario be solved with an option on the referrers API where registries could return recursive results

I don't love the sound of pushing recursive results onto the registry.

What I'm hoping to see is something like a CNAB type of artifact, that is too complex to represent with a single image manifest, and which also has a need to refer to another manifest (rather than including it as one of it's own child manifests).

cc @aw185176 this sounds a lot like you

@aw185176
Copy link

cc @aw185176 this sounds a lot like you

Definitely. I think most/all package management use cases would fall into the
same bucket.

Internally at NCR we have built a K8s package management toolchain that stores packages as either images or indexes. Because this pre-dates the referrers API, we represent dependency relationships using indexes. Forgive the terrible legend:

  • Hexagons are index manifests
  • Rectangles are image manifests
flowchart TB

root{{guestbook:latest}}-->gb["guestbook@sha256:..."]

root-->redis["redis@sha256:..."]
root-->nginx["nginx@sha256:..."]
Loading

And just like any other packages, your deps brought deps:

flowchart TB

root{{guestbook:latest}}-->gb["guestbook@sha256:..."]

root-->redis["redis@sha256:..."]
root-->nginx{{"nginx@sha256:..."}}-->a["foo@sha256:..."] & b["cowsay@sha256..."]
Loading

We also support packages providing variants for specific K8s distros, so the final graph for a package with dependencies and provider variants might look like:

flowchart TB

root{{guestbook:latest}}-->gb["guestbook:gke"] & gba["guestbook:eks"] & gbc["guestbook:bespoke"]

root-->redis["redis@sha256:..."]
root-->nginx{{"nginx@sha256:..."}}-->ng["nginx:gke"] & na["nginx:eks"]
Loading

Using subject to represent package dependencies would be our preference, but even if we unwound our dependencies to flatten our graph, a package with provider variants would still be an index. Like I said, I suspect this kind of dynamic would exist in any similar solution that has support for multi-platform artifacts. I don't think it would be hard to imagine a homebrew package with dependencies (assuming they wanted to define their dependency graph in the registry) being shaped similarly to what I've laid out and running into the same issues.

@sudo-bmitch
Copy link
Contributor

@aw185176

Using subject to represent package dependencies would be our preference

I'm not quite sure I see the use case. The subject/referrers allows you to push a manifest that extends another after the fact. So you would be able to push the guestbook image, and later push a multiplatform nginx image that indicates it extends the guestbook image. If you wanted to use that nginx image for another app, you would need to push a separate nginx manifest with a different subject. Clients copying the guestbook could decide not to include nginx, and inject their own database, or maybe multiple database images, that all extend the guestbook app. If they wanted to use the same postgres image for their to-do app, they would again need to push a different manifest to include the new subject.

Is that what you have in mind? As an end user, that feels very confusing and unpredictable, where an index without the subject feels much more predictable.

Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is done, we need to add artifactType, and update the guidance to show how artifacts would be created using an index. I'd like to see at least one tool (oras, crane, cosign, regclient, etc) that is using this in a real use case.

@sudo-bmitch sudo-bmitch added this to the v1.1.0 milestone May 25, 2023
@sajayantony
Copy link
Member

+1 on getting artifactType into Index. I remember the conversations where CNAB wanted to use index but had to resort to the first image or annotations due to missing config blob, and the only reason for wanting the config blob was because of config.mediaType which was equivalent to artifactType.

@brackendawson
Copy link

The discussion from 2023-04-13 has now been uploaded, it's at this time index: https://www.youtube.com/watch?v=Bx-urZXikMk&t=2898s

To me, @tianon's example where you have an SBOM and provenance artifact and an index of both of those returned by the referrers API is weird, because the return type of the referrers API already is that OCI Index.

@jonjohnsonjr's example makes sense to me.

It would be nice to be able to attach an SBOM to the index as well that is just an aggregation of all the child SBOMs, like this:

graph TD;

sboms-->lsbom;
sboms-.->index;
sboms-->wsbom;

lsbom[sbom]-.->linux;

index-->linux;
index-->windows;

wsbom[sbom]-.->windows;
Loading

So you can ask the registry for referrers of the index and get an artifact that represents all the sboms for that index.

This is a little nicer than having to walk down then up to collect all the SBOMs for a given tag.

But this is over-constrained and unchecked, what I put my naughty hat on and do this:

graph TD;

sboms-->lsbom;
sboms-.->index;
sboms-->wsbom;

lsbom[sbom];
xsbom[sbom]-.->linux;

index-->linux;
index-->windows;

wsbom[sbom]-.->windows;
Loading

Do I now have a security vulnerability because different client implementations will see different SBOMs for the same image?

@sudo-bmitch
Copy link
Contributor

For those that have already implemented this, how do those implementations handle the following: ghcr.io/sudo-bmitch/oci-sandbox:loop

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 417,
      "digest": "sha256:569d87a6e370c66360ef2a2a61fc7477ba0f282ce0e84c93f7d6685c220df445",
      "annotations": {
        "org.opencontainers.image.ref.name": "hello"
      }
    }
  ],
  "subject": {
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "size": 417,
    "digest": "sha256:569d87a6e370c66360ef2a2a61fc7477ba0f282ce0e84c93f7d6685c220df445"
  }
}

And the sha256:569... manifest is a simple image/artifact:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/example.hello",
  "config": {
    "mediaType": "application/vnd.oci.scratch.v1+json",
    "size": 2,
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a"
  },
  "layers": [
    {
      "mediaType": "application/octet-stream",
      "size": 12,
      "digest": "sha256:a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447"
    }
  ]
}

Are there UI's to display it with a folder like structure? Code to copy images and their referrers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow subject on index manifest
7 participants