Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2 repository -> SBOM #1

Open
merks opened this issue Nov 8, 2023 · 6 comments
Open

p2 repository -> SBOM #1

merks opened this issue Nov 8, 2023 · 6 comments

Comments

@merks
Copy link
Owner

merks commented Nov 8, 2023

I did some prototyping how one might generate an SBOM from a p2 repository.

The prototype creates a bom.xml from the Eclipse Installer self-contained repository:

https://download.eclipse.org/oomph/products/repository

It's implemented by this simple (very rough prototype) p2 application which loads the repository and analyzes the metadata and actual artifacts to produce a CycloneDX representation:

SBOMApplication.java

You can see that the bom.xml includes dependency information by wiring requirements to capabilities.

I've done some extracting of license information too, e.g., what I could find in the pom in the artifact jar, but that information is not always easy to track down in a consistent way or place. Also, I don't want to duplicate work that has already been done, e.g., by spdx or dash.

My sense is that one can produce some quite good information relatively quickly, but then one starts to slide down the slippery slope of complexity. Also many questions remain/arise about the form of the representation and how well these concepts apply to the OSGi world of bundles as well as the Equinox world of features.

The IDE WG's budget can't accommodate what could turn into weeks or months of work, and it's not entirely clear yet in which form folks want to do the analysis. I.e., there appears to be mention of doing the production of a SBOM as part of the build; implementing something that runs in Maven versus something that runs as an OSGi application is quite different technically.

So before investing more time, it would be good to discuss the strategy and plans around this...

Below is more background information and details.


I am reusing cyclonedx's core library, which have quite some dependencies:

image

To use this stuff in an OSGi environment, i.e., in the above application where p2 functions, one needs quite few new dependencies that could be added to Orbit which I did locally:

image

image

I also looked at SPDX's dependencies but that's a massive list and is clearly designed to be used within a maven build:

image

@waynebeaton
Copy link

I guess that I should make the Dash License Tool's core library an OSGi bundle. I should probably do that anyway (and factoring out the CLI stuff is overdue). FWIW, SPDX doesn't provide license information about content, they're more about defining how to specify license information. The Dash License tool already knows how handle our rules with regard to ClearlyDefined and the IPLab database.

At least internally, we've latched onto CycloneDX. My recommendation is that we focus on that, leaving SPDX SBOM generation as a nice-to-have for later consideration.

I don't have an opinion regarding whether or not content should be added to Orbit. If you feel that it should be, then we can try to find some resources to help.

I did some prototyping how one might generate an SBOM from a p2 repository

What we'd like to produce first is an SBOM for an Eclipse IDE. I'm quite disconnected from the p2 technology... can we think of the configuration of a particular IDE product as a p2 repository? Or is this a different problem?

My sense is that one can produce some quite good information relatively quickly, but then one starts to slide down the slippery slope of complexity.

My thinking is that if we can produce something that quite good relatively quickly, we should do that. When we actually have something, it will be easier to get help (or allocate funds) to improve it.

Are you able to create "quite good information relatively quickly" within your current mandate?

@merks
Copy link
Owner Author

merks commented Nov 9, 2023

The contribution of the needed library to Orbit is effectively already prepared and only needs to be committed to main.

I picked the Eclipse Installer repository as a small illustrative example. It is a product (like the Eclipse SDK and like each of the EPP packages). Its p2 repository is transitively complete with respect to its dependencies/requirements which is typically the case for product repositories as built by Tycho. Such a p2 repository is effectively an alternative representation of an actual product installation; in fact is it like the union of all the different OSes and architectures supported by the product. The current prototype can already be applied to any p2 repository; if it's not transitively complete with respect to requirements then there will be missing dependencies links, but all else would look the same...

Given there appears to be significant interest, I can spend some more time, as time permits, to flesh this out further. As 2023-12 draws to a close, there aren't currently so many spare cycles...

@GuillaumeEscande
Copy link

Hi, from my perspective, there are indeed two complexities:

  • The first one is to precisely determine the dependencies integrated into a generated product. This means identifying the components and versions precisely integrated into an Eclipse IDE bundle. Apparently, the proposed prototype is a very good starting point to extract this precise information (Is it possible to push the pom.xml and the necessary resources for the execution of this prototype? I would like to test it as part of internal experimentation).

  • The second complexity is to precisely extract the licenses of each of these components, and here, it's very complicated for two reasons: 1 - there is no standardization of PURL for P2 or OSGI packages. Therefore, it's nearly impossible to correlate a list of dependencies with a license database. 2 - There is no standardization or best practice for declaring a license in OSGI bundle manifests. There are various methods, but there isn't really a control metric or verification step for the correct presence of this license. Dash is a good solution for finding licenses for some OSGI bundles, but without PURL or without clear rules on what the GroupID and artifactID of a PURL should be, it limits the possibilities.

Would it not be possible to establish a control rule or best practice, already at the foundation's product level, to facilitate the adoption and standardization of this practice?

As exemple, the bundle org.eclipse.equinox.security.linux store his licence in the header of fragment.properties or in the about.html and the bundle org.eclipse.sdk store his licence in the header of other files.

Perhaps the right path initially is to require/encourage the use of a dedicated property in the manifest to input license information in a format understandable by license detection tools like Dash or SPDX.

@merks
Copy link
Owner Author

merks commented Nov 15, 2023

It's on my TODO list to make this stuff available in a reusable form, i.e., in a form that one can build and then test it with an arbitrary p2 update site as input.

But with SimRel winding to a close for the December 6th release, I am swamped with other priorities.

https://github.com/eclipse-simrel

If you stay tuned here I will update you on progress and availability on this issue.


Yes, the prototype resolves requirements against capabilities to provide precise dependencies between components.

The lack of any type of standard PURL is kind of a problem and the group/name thing is just not really applicable in a space where the bundle symbolic name / version are the unique identifier.

The fact that license information can be sprinkled anywhere and everywhere in general is also a problem Even if the stuff is supposed to be in a standard place in the pom, that doesn't mean folks (maven artifacts) actually populate it properly, at all, or correctly. At Eclipse, the standard approach is for each bundle to have an about.html that is included also in the binary plugin:

image

Features do it somewhat differently, but also follow standard rules and conventions...

Perhaps we could simplify the license metadata by including information in the MANIFEST.MF as you suggest, but hell might freeze over before all the projects actually conform to such a new approach. I speak from experience with SimRel where it's a challenge to get projects to do anything. 😱

Thank you for your interest in this.

@merks
Copy link
Owner Author

merks commented Nov 25, 2023

Note to self and others, this is related formation:

package-url/purl-spec#272
https://github.com/eclipse/jbom

@merks
Copy link
Owner Author

merks commented Jan 8, 2024

I have committed a relatively complete functional prototype to this public GitHub repo, pending any expressed interest in going further with this approach or simply reusing any parts for implementing a different approach:

https://github.com/merks/p2repo-sbom/

The repository provides an Oomph setup for creating a development environment automatically:

https://github.com/merks/p2repo-sbom/blob/main/CONTRIBUTING.md

A ci job to build the prototype product is available here:

https://ci.eclipse.org/cbi/view/p2RepoRelated/job/cbi.p2repo.sbom-build/

Product downloads are available here:

https://download.eclipse.org/cbi/updates/p2-sbom/products/nightly/latest

The following Jenkinsfile provides an example of how to use the prototype:

https://github.com/merks/p2repo-sbom/blob/main/SBOMGenerator.jenkinsfile

It's used by this job:

https://ci.eclipse.org/cbi/view/p2RepoRelated/job/cbi.p2repo.sbom-generator/

That job takes the Eclipse SDK 4.30 release as input:

https://download.eclipse.org/eclipse/updates/4.30/R-4.30-202312010110

and generates SBOMs in both xml and json format:

https://ci.eclipse.org/cbi/view/p2RepoRelated/job/cbi.p2repo.sbom-generator/lastSuccessfulBuild/artifact/

The license generation is a less-than-ideal hack...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants