Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the ClearlyDefined certifier #167

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions guac-clearly-defined-certifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
layout: page
title: ClearlyDefined certifier
permalink: /certifier-clearlydefined/
---

### Overview

GUAC (Graph for Understanding Artifact Composition) integrates with
**ClearlyDefined** (https://clearlydefined.io/?sort=releaseDate&sortDesc=true)
to enhance supply chain transparency by retrieving accurate license data for
software dependencies. This functionality helps organizations make informed
decisions about software licenses when managing their dependencies.

### How GUAC Pulls from ClearlyDefined

GUAC accesses ClearlyDefined data through a **certifier process** instead of a
collector model. This choice ensures that queries are **re-run on a scheduled
basis** to reflect any updated license information, given that new software
releases can alter licensing data over time. The certifier’s role complements
GUAC’s use of SBOM (Software Bill of Materials) data, allowing it to validate or
augment license details.

- **Invocation**:

- The certifier runs periodically or can be triggered during **SBOM
ingestion** to pull data directly from ClearlyDefined.
- However, including ClearlyDefined queries during SBOM ingestion may **slow
down processing**, especially for larger datasets.
- For further details visit
https://docs.guac.sh/guac-configuration/#certifier-configuration.

- **Mapping and Query System**:
- ClearlyDefined identifies software using **coordinates**, which GUAC maps to
**pURLs (package URLs)**. A custom mapping library facilitates this
conversion to align the two systems.

### What is and is not Supported

1. **Supported Features**:

- GUAC retrieves detailed license data from ClearlyDefined.
- Both SBOM-derived and ClearlyDefined license data are preserved, allowing
users to make manual decisions in case of conflicts.
- **CertifyLegal nodes** are created for both sources, ensuring complete
graph observability.

2. **Limitations**:
- **Batched queries** are not supported by ClearlyDefined’s API, which can
result in rate limits and slower performance for large dependency graphs.
- The system does not automatically resolve conflicting data between SBOMs
and ClearlyDefined; users must determine the most trustworthy source
themselves.

### Invocation Process

GUAC integrates with ClearlyDefined in three ways:

1. **Scheduled Certifier Execution**

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a third option: running guacone certifier cd from the command line

- Automatically runs at specified intervals to keep the dependency data up to
date.

2. **On-Demand during SBOM Ingestion**

- Queries ClearlyDefined in real time during the ingestion of SBOMs, which
may slow down the processing.

3. **Manual Command Line Execution**
- This method provides greater control, allowing users to run queries
whenever needed.

---

Below is a table of the supported **command-line arguments** for the
**ClearlyDefined certifier**. These flags allow users to control input, output,
and execution behavior.

| **Argument** | **Description** | **Example** |
| ---------------------- | ----------------------------------------------- | --------------------------- |
| `--help` | Displays help message. | `--help` |
| `--input <path>` | Specifies input directory or file. | `--input ./sboms` |
| `--output <path>` | Sets output directory. | `--output ./results` |
| `--format <type>` | Sets output format (e.g., `json`, `spdx`). | `--format json` |
| `--log-level <lvl>` | Adjusts log level (`info`, `debug`, `error`). | `--log-level debug` |
| `--poll-interval <n>` | Sets polling interval in minutes (daemon mode). | `--poll-interval 10` |
| `--certify-only` | Runs only certification, skipping ingestion. | `--certify-only` |
| `--collector <type>` | Specifies collector (e.g., `gcs`, `github`). | `--collector gcs` |
| `--daemon` | Runs in daemon mode for continuous polling. | `--daemon` |
| `--config <path>` | Loads custom configuration file. | `--config ./config.yaml` |
| `--api-endpoint <url>` | Sets API endpoint for results. | `--api-endpoint http://...` |

---

### Key Notes:

- **Daemon mode**: When invoked with `--daemon`, the certifier continuously
polls the specified collector (like `gcs`) to fetch updates.
- **Poll interval**: Default polling may be every 5 or 10 minutes, configurable
via the `--poll-interval` argument.
- **Formats**: Supported output formats include `json`, `spdx`, or other
standards recognized by the tool.
Loading