Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to get the LayerID the package was first found in #435

Open
jonathongardner opened this issue Jun 11, 2021 · 5 comments · May be fixed by #1724
Open

Provide a way to get the LayerID the package was first found in #435

jonathongardner opened this issue Jun 11, 2021 · 5 comments · May be fixed by #1724
Assignees
Labels
enhancement New feature or request

Comments

@jonathongardner
Copy link

A flag that would include the layerID the package first showed up in.

When tracking down a package (maybe b/c it has vulnerabilities or Im not sure why its in my SBOM) it would be helpful to know what layer it first showed up in so I can look at the commands run to generate that layer.

Currently it looks like the layerID returned under locations is the last layerID the path was touched in. So for example if I did something like:

# layer1
FROM alpine
# layer2
RUN apk add --update nodejs-current=15.10.0-r0

The package “busybox” version "1.32.1-r6" has a layerID of layer2. I know there is the “–scope all-layers” option but that would also return any packages that were removed from the final image.

@jonathongardner jonathongardner added the enhancement New feature or request label Jun 11, 2021
@bureado bureado mentioned this issue Jun 29, 2021
2 tasks
@wagoodman
Copy link
Contributor

The work proposed in #32 aims to deduplicate packages in a way where the same package found in multiple locations would be listed as a single package and have multiple entries on the .locations array on the artifact (in the json format). If this was implemented then the first item on the .locations array would answer the question of "where was the first instance of the particular package found".

@jonathongardner
Copy link
Author

@wagoodman I see #32 was closed. I was checking it out and it looks good i dont think it really solves this problem though (though its a little more helpful). The issue still exist if i run --scope all-layers i can see the layer the package first shows up in (and now because of the deduplicate its in one package which is somewhat helpful) but i still get packages that might not be in the final image (I can provide an example of this if needed). If i run it without --scope all-layers than it still only returns the last layer the component was touched in (and for deb/alpine packages that confusing because the package manger DB is touched whenever i do an install so its always the last layer i do a package manger install).

Right now what im having to do to get around this is run syft with --scope squashed then create an array of package ids (so i know what packages are in the final image) than run syft with --scope all-layers and filter out packages not in the package ids array

@spiffcs spiffcs added this to OSS May 4, 2022
@spiffcs spiffcs moved this to Triage (Comments or Progress Made) in OSS May 31, 2022
@spiffcs spiffcs moved this from Triage (Comments or Progress Made) to Backlog (Pulled Forward for Priority no more than 10) in OSS Jul 8, 2022
@wagoodman
Copy link
Contributor

wagoodman commented Mar 24, 2023

I think there is a path forward on this one. We would need to create a new image-based FileResolver that would act a little like the squashed resolver and the all-layers resolver. The squashed resolver returns a location for all paths in the squashed representation. The all-layers resolver returns one or more locations to the all paths in all layers.

We really want something that would return all locations from all layers for all paths in the squashed representation. In this way the catalogers would have visibility into all places where the file was introduced/changed and the existing downstream package merging logic would account for packages that are the same and found in the same path across multiple layers.

This could be selectable by a new scope like --scope squashed-with-all-layers (a terrible name, but just as an example).

From an implementation point of view, this would look an awful lot like the existing all-layers resolver today with an additional filtering step based on a query to the squashed representation. The catalogers would catalog all location instances, raising up duplicates, and the set of duplicates would be merged. The single merged package would have pkg.Locations populated with all layers which the package definitions were found in.

This means that for a dpkg that was added on layer 1, but other packages were installed in other (future) layers, since there is a shared database there would be a location added to the package for every layer which the database file was modified from the starting layer (when the package was installed) moving forward. This case is a little awkward, but is accurate relative to what syft understands about the package, and seems like a good first step.

@tomerse-sg
Copy link

what is the status of this request? can be very useful :)

@tomersein
Copy link
Contributor

please look at this pr - #3138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

4 participants