diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9a03f342..fcd67815 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -15,10 +15,11 @@ You can contribute in the following ways: - Contribute changes to documentation by [submitting pull requests](#submit-pr) to it. **Contribute Code** -- [Resolve Issues](https://github.com/vmware/tern/issues). -- Improve the robustness of the project by - - [Adding to the Command Library](docs/adding-to-command-library.md). - - [Adding a Custom Template](docs/creating-custom-templates.md). +- [Resolve Issues](https://github.com/vmware/tern/issues) +- Improve the robustness of the project by: + - [Adding to the Command Library](docs/adding-to-command-library.md) + - [Adding a Custom Report Format](docs/creating-custom-templates.md) + - [Adding an Extension](docs/creating-tool-extensions.md) ## Am I Qualified to Contribute? diff --git a/README.md b/README.md index a5782bbb..91d20d41 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ Tern is a software package inspection tool for containers. It's written in Pytho - [Glossary of Terms](/docs/glossary.md) - [Architecture](/docs/architecture.md) - [Navigating the Code](/docs/navigating-the-code.md) + - [Data Model](/docs/data-model.md) - [Getting Started](#getting-started) - [Getting Started with Docker](#getting-started-with-docker) - [Getting Started with Vagrant](#getting-started-with-vagrant) @@ -27,10 +28,16 @@ Tern is a software package inspection tool for containers. It's written in Pytho - [JSON Format](#report-json) - [YAML Format](#report-yaml) - [SPDX tag-value Format](#report-spdxtagvalue) +- [Extensions](#extensions) + - [Scancode](#scancode) + - [cve-bin-tool](#cve-bin-tool) - [Running tests](#running-tests) - [Project Status](#project-status) -- [Documentation](#documentation) -- [Contributing](#contributing) +- [Contributing](/CONTRIBUTING.md) + - [Code of Conduct](/CODE_OF_CONDUCT.md) + - [Creating Report Formats](/docs/creating-custom-templates.md) + - [Creating Tool Extensions](/docs/creating-tool-extensions.md) + - [Adding to the Command Library](/docs/adding-to-command-library.md) # What is Tern? Tern is an inspection tool to find the metadata of the packages installed in a container image. The overall operation looks like this: @@ -202,6 +209,21 @@ $ tern -l report -f yaml -i golang:1.12-alpine -o output.yaml ``` $ tern -l report -f spdxtagvalue -i golang:1.12-alpine -o spdx.txt ``` + +# Extensions +Tern does not have its own file level license scanner. In order to fill in the gap, Tern allows you to extend container image analysis with an external file analysis CLI tool or Python3 module. + +## Scancode +[scancode-toolkit](https://github.com/nexB/scancode-toolkit) is a license analysis tool that "detects licenses, copyrights, package manifests and direct dependencies and more both in source code and binary files". To use it to analyze container images, run: +``` +$ tern -l report -x scancode -i golang:1.12-alpine +``` + +## cve-bin-tool +[cve-bin-tool](https://github.com/intel/cve-bin-tool) is a command line tool which "scans for a number of common, vulnerable components (openssl, libpng, libxml2, expat and a few others) to let you know if your system includes common libraries with known vulnerabilities". Vulnerability scanning tools can also be extended to work on containers using Tern, although support for certain metadata pertaining to CVEs may not be available yet. As a result, you will not see any of the results in the generated reports. To try it out, run: +``` +$ tern -l report -x cve_bin_tool -i golang:1.12-alpine +``` # Running tests WARNING: The `test_util_*` tests are not up to date. We are working on it :). From the Tern repository root directory run: @@ -225,10 +247,10 @@ Somewhere along the line of development, we accidentally rewrote git history on * [v0.2.0](docs/releases/v0_2_0.md) * [v0.1.0](docs/releases/v0_1_0.md) -## Documentation +## Documentation Architecture, function blocks, code descriptions and the project roadmap are located in the docs folder. Contributions to the documentation are welcome! See the [contributing guide](/CONTRIBUTING.md) to find out how to submit changes. -## Get Involved +## Get Involved Do you have questions about Tern? Do you think it can do better? Would you like to make it better? You can get involved by giving your feedback and contributing to the code, documentation and conversation! diff --git a/docs/adding-to-command-library.md b/docs/adding-to-command-library.md index 23c39b13..0d7e37fe 100644 --- a/docs/adding-to-command-library.md +++ b/docs/adding-to-command-library.md @@ -303,3 +303,5 @@ should always be 1 as you are querying for only one package name. As always, don't hesitate to ask questions by filing an issue with 'Question:' as the prefix of the subject, on the Slack channel or on the mailing list. + +[Back to the README](../README.md) diff --git a/docs/architecture.md b/docs/architecture.md index 44c21fff..0fd27d2e 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -3,12 +3,14 @@ You may want to look at the [glossary](./glossary.md) to understand the terms being used. ## Overall Approach -The general approach to finding package metadata given some files is to perform static analysis on the files. Tern's approach is more brute force - using the same tools that were used to install a package to retrieve the status of the package. This involves mounting the overlay filesystem and running commands against it at the most basic level. The results of the shell commands are then collated into a nice report showing which container layers brought in what packages. For compliance purposes (as Tern is a compliance tool), shell scripts to retrieve compliance information are used. Tern is also built to be a tool for guidance and hence will point out any missing or unknown information. For example, if it cannot find information about licenses, it will inform you of this and nudge you to either add information about it or a shell script to retrieve it. +Tern uses a "container aware" approach to analyze container images. Tern tries to find the tool or method used to install software components in the container and will use equivalent debugging methods to find the status of said software component. This involves mounting the overlay filesystem and running commands against it at the most basic level. The results of the shell commands are then collated into a nice report showing which container layers brought in what packages. Tern is also built to be a tool for guidance and hence will point out any missing or unknown information. For example, if it cannot find information about licenses, it will inform you of this or nudge you to either add information about it or a shell script to retrieve it. + +Tern can also use license file scanners to scan container images. This is not the "container aware" approach that the native analyzer uses. Instead Tern will just run the scanners on the filesystems in the container image. This approach works for current container distribution methods as all the layers are "pushed" to the container registry's repository belonging to the originator. As a result, the originator is obligated to provide sources for software components included in all layers. This is not the case for security scanning as the files at the lower layers may be patched in the later layers, thus invalidating the results from the lower layers. However, this doesn't stop you from using a security scanner on the container image. ## Guiding Principles 1. OSS Compliance First: Tern is a compliance tool at heart. Hence whatever it does is meant to help the user meet their open source compliance obligations, or, at the minimum, make them aware of the licenses governing the open source software they use. -2. Be Transparent: Tern will report on everything it is doing in order to not mislead on how it got its information -3. Inform not Ignore: Tern will report on all exceptions it has encountered during the course of its execution +2. Be Transparent: Tern will report on everything it is doing in order to not mislead on how it got its information. This allows Engineers to ascertain confidence in the results themselves. +3. Inform not Ignore: Tern will report on all exceptions it has encountered during the course of its execution. This, again, is to allow Engineers to make assessments on the efficacy of the tool, the results and the container image itself. ## Layout @@ -19,7 +21,7 @@ Here is a general architecture diagram: There are 4 components that operate together: ### The Cache -This is the database where filesystem identifiers can be queried against to retrieve package information. This is useful as many containers are based on other container images. If Tern had come across the same filesystem in another container, it can retrieve the package information without spinning up a container. Tern looks for filesystems here before doing any analysis. This is Tern's own data store which can be curated and culled over time. The reason that Tern keeps its own data store is because the filesystem artifacts that make up a container image are not necessarily how other compliance databases store license information. +This is the database where filesystem identifiers can be queried against to retrieve package information. This is useful as many containers are based on other container images. If Tern had come across the same filesystem in another container, it can retrieve the package information without spinning up a container. Tern looks for filesystems here before doing any analysis. This is Tern's own data store which can be curated and culled over time. The reason that Tern keeps its own data store is because the filesystem artifacts that make up a container image are not necessarily how other compliance databases store license information. The filesystems also follow their own method of identifying themselves. A container build is not reproducible, so often, even when the content of the filesystem has not changed, the container's checksum has and that makes it difficult to identify the contents of a container image. ### The Command Library This is a database of shell commands that may be used to create a container's layer filesystem. There are two types of shell commands - one for system wide package managers and one for custom shell commands or install scripts. The library is split in this way to account for situations where whole root filesystems are imported in order to create a new container. @@ -28,15 +30,17 @@ For example, in [this Dockerfile sample](../samples/debian_vim/Dockerfile), the Some acknowledgement should be made here that this is not the only way to create a container. However, it seems to be the most prevalent way right now. In the future, the community may move away from the Copy on Write filesystems and instead use one filesystem. If this were the case, Tern's job might be a bit easier, but this is just speculation. -### The Core -Currently, the code combines the functions of report formatting and image analysis and requires refactoring. At a conceptual level, the Core is what unbundles a container image, reads the accompanying config file, untars all the filesystems, mounts them (or not, depending on what kind of analysis the Command Library says to do), and sets up a chroot to run scripts (or not, again, something that it will find out from the Command Library). The general functionality of the core looks like this: +When Tern uses an external file scanner, it bypasses the command library altogether and instead relies on the external tool's results. The approach allows Engineers to compare results from different tools available to them as they have always done, but on container images. + +### The Analyzer +At a conceptual level, the Analyzer is what unbundles a container image, reads the accompanying config file, untars all the filesystems, mounts them (or not, depending on what kind of analysis and which analyzer is used), and sets up a chroot to run scripts (or not, again, depending on the analysis type). Tern has a dedicated analyzer to the type of image being analyzed. Currently, it can analyze only images created by Docker. The inspection part can be done using Tern's native analyzer or an external tool. The general flow of the native analyzer looks like this: Tern process flow -The core will collate the metadata it can get in Image objects which encapsulate data for each layer and each package found. It also encapsulates notes while execution takes place, so the report is transparent about what worked and what didn't. +The analyzer will collate the metadata it can get in Image objects which encapsulate data for each layer and each package found. It also encapsulates notes while execution takes place, so the report is transparent about what worked and what didn't. ### The Formatter -Tern's main purpose is to produce reports, either as an aid for understanding a container image or as a manifest for something else to consume. The default is the verbose report explaining where in the container the list of packages came from and what commands created them. The type of reports supported can be found in the project README. Also, take a look at the [template creation process](./creating-custom-templates.md) to get a better understanding of how Tern supports multiple formats. +Tern's main purpose is to produce reports, either as an aid for understanding a container image or as a manifest for something else to consume. The default is the verbose report explaining where in the container the list of packages came from and what commands created them. The type of reports supported can be found in the project README. Also, take a look at the [custom report format creation process](./creating-custom-templates.md) to get a better understanding of how Tern supports multiple formats. ## Objects Tern uses Object Oriented Programming concepts to encapsulate the data that it will be referencing during thr course of execution. They can be found in the `tern/classes` directory. The general format is that an object of type Image contains a list of type ImageLayer and each ImageLayer contains a list of type Package. On top of that each of those objects contain an object of type Origins. The Origins object contains a list of type NoticeOrigin which contains a list of type Notice. diff --git a/docs/creating-custom-templates.md b/docs/creating-custom-templates.md index 99e64f2c..61cc4cc7 100644 --- a/docs/creating-custom-templates.md +++ b/docs/creating-custom-templates.md @@ -110,3 +110,5 @@ Continuing with the example from step #2, if you wanted to enable your `custom` yaml = tern.formats.yaml.generator:YAML + custom = tern.formats.custom.generator:Custom ``` + +[Back to the README](../README.md) diff --git a/docs/creating-tool-extensions.md b/docs/creating-tool-extensions.md new file mode 100644 index 00000000..26780a9d --- /dev/null +++ b/docs/creating-tool-extensions.md @@ -0,0 +1,46 @@ +# Creating a Tool Extension +You can use Tern with another file or filesystem analysis tool to analyze container images. You can find examples of such tools in the `tern/extensions` folder. Currently two external tools are supported: +* [scancode-toolkit](https://github.com/nexB/scancode-toolkit): A license scanning tool that finds licenses in source code and binaries. Although support for formatting is not in place at the moment, it is something that will be completed in subsequent releases. +* [cve-bin-tool](https://github.com/intel/cve-bin-tool): A security vulnerability scanning tool that finds common vulnerabilities. Note that although you can use a security scanner with Tern, there isn't any support for reporting the results beyond printing them to console. This may change as the industry demand for security information in Software Bill of Materials seems to be on the rise. + +If you would like to make a tool extension, here are some general steps to follow: + +## 1. Familiarize yourself with Tern's Data Model + +The classes for the objects that are used to keep discovered metadata are in the `tern/classes` folder. Check out the [data model document](./data-model.md) for a general layout of the classes. Refer to the individual files for a list of properties. These store the supported metadata. If you do not see the metadata you are interested in, please submit a proposal issue to add this property to the appropriate class. This should be a reasonably trivial change with minimal effect on backwards compatibility. + +## 2. Create a plugin + +To create a plugin for the tool, create a folder under `tern/extensions` with the plugin name. Create an empty `__init__.py` file here and create a file called `executor.py`. This file will contain a class which is derived from the abstract base class `executor.py` located under `tern/extensions`. The `Executor` class requires you to implement the method `execute` which takes an object of type `Image` or any of its derived classes (for example `DockerImage`). You can use this method to call a library function or run a CLI command to collect the required information. Once done, you can set the properties of the `Image` object and the objects within it (see the data model as a reference. See the `.py` files in `tern/classes` for a list of properties you can set. + +You can refer to the existing plugins as a guide for implementing the `execute` method of your executor class. There are helper methods in `tern/analyze/passthrough.py` which you can make use of, or write your own implementation if you need to. + +## 3. Test your plugin + +To test your plugin, add the plugin to `setup.cfg` under `tern.extensions`. For example, let's say you have created a plugin called "custom" to run a custom script. Your plugin's `executor.py` should live in `tern/extensions/custom`. You will add the plugin as follows: + +``` +tern.extensions = + cve_bin_tool = tern.extensions.cve_bin_tool.executor:CveBinTool + scancode = tern.extensions.scancode.executor:Scancode + custom = tern.extensions.custom.executor:Custom +``` + +To test out your plugin run: + +``` +$ pip install -e.[dev] +$ tern -l report -x custom -i +``` + +To test out the different formats for your plugin run: + +``` +$ tern -l report -x custom -f -i +``` + +where is one of Tern's supported formats. To see what formats are supported, run `tern report --help`. + +If you need a custom report format please refer to the document on [creating custom report formats](./creating-custom-templates.md) + +[Back to the README](../README.md) diff --git a/docs/data-model.md b/docs/data-model.md new file mode 100644 index 00000000..a00d5593 --- /dev/null +++ b/docs/data-model.md @@ -0,0 +1,11 @@ +# Tern's Data Model + +Tern stores metadata about the image and messages during operation in objects described here. The overall data model looks like this: + +![Tern data model](./img/tern_data_model.png) + +The main class is `Image` and its derived classes. This class contains a list of type `ImageLayer`. `ImageLayer` contains a list of type `Package`. `Image`, `ImageLayer` and `Package` contain a property called `origins` which is an object of type `Origins`. This class is used to record notes while Tern operates on an image such as what tools were used to retrieve the metadata or if the filesystem is of unknown content. `Origins` contains a list of type `NoticeOrigin` which contains a string and a list of type `Notice`. The `Notice` objects are where messages get recorded. You can easily record a message in the `origins` property of the `Image`, `ImageLayer` and `Package` types of objects by using the `add_notice_to_origins` method which just adds a `Notice` object to the `NoticeOrigin` object containing the origin string you give it ("origin_str" is basically a string indicating where in the image or analysis stage an event that you want recorded occured). + +You will also see a class called `Template`. This is an abstract base class used to make custom formats. To learn more see the [documentation on creating custom formats](./creating-custom-templates.md). + +[Back to the README](../README.md) diff --git a/docs/faq.md b/docs/faq.md index e16def39..5e46abcf 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -10,3 +10,5 @@ Static analysis is a reasonable approach to find software components and there a ## Why Python? Python is well suited for easy string formatting which is most of the work that Tern does. + +[Back to the README](../README.md) diff --git a/docs/glossary.md b/docs/glossary.md index ed185b92..03f715ed 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -2,10 +2,13 @@ - Command Library: Tern references a database of shell commands to determine what packages got installed. This is called the "Command Library". - Report: the artifact produced after running Tern. This is either a text document or a machine readable format. -- Image: A container image, typically following the [OCI image specification](https://github.com/opencontainers/image-spec/blob/master/spec.md) +- Image: A container image, typically created by [Docker](https://www.docker.com/) or following the [OCI image specification](https://github.com/opencontainers/image-spec/blob/master/spec.md) - Layer: A root filesystem or the difference between a previous filesystem and a new filesystem as created by storage drivers like AUFS or OverlayFS. See the [OCI Image Layer specification](https://github.com/opencontainers/image-spec/blob/master/layer.md) for a general overview of how layer filesystems are created. - Package: A software package or library - Notice: A record of an incident that Tern came across during execution - Notice Origin: The location from which the Notice came. This can be the container or Dockerfile or Command Library or something in the development environment. - Cache: A database that associates container layer filesystems with the packages that were installed on them. Currently this is only represented by a yaml file and some CRUD operations against it. -- Dockerfile: A file containing instructions to the [Docker](https://docs.docker.com/engine/reference/commandline/build/) daemon on how to build a container. +- Dockerfile: A file containing instructions to the [Docker](https://docs.docker.com/engine/reference/commandline/build/) daemon on how to build a container image. +- Extension: An external tool Tern can use to analyze a container image + +[Back to the README](../README.md) diff --git a/docs/img/arch.png b/docs/img/arch.png index d93b8178..189629f2 100644 Binary files a/docs/img/arch.png and b/docs/img/arch.png differ diff --git a/docs/img/tern_data_model.png b/docs/img/tern_data_model.png new file mode 100644 index 00000000..ba963bfe Binary files /dev/null and b/docs/img/tern_data_model.png differ diff --git a/docs/navigating-the-code.md b/docs/navigating-the-code.md index c5f1e3d9..5cf64751 100644 --- a/docs/navigating-the-code.md +++ b/docs/navigating-the-code.md @@ -6,6 +6,8 @@ Here is a layout of the directory structure: ``` ▾ tern/ + __main__.py <-- Tern entry point + ▾ analyze/ <-- Each container image format will have its own subdirectory for analysis common.py <-- Common modules used at a high level throughout the code ▾ docker/ @@ -14,6 +16,7 @@ Here is a layout of the directory structure: dockerfile.py docker.py <-- modules specific to Docker run.py + ▾ classes/ <-- These are individual objects. Each has a corresponding test in the tests directory command.py docker_image.py @@ -24,10 +27,19 @@ Here is a layout of the directory structure: origins.py package.py template.py + ▾ command_lib/ <-- These are the bash commands that get run to collect information about the container image base.yml <-- System wide scripts for package managers and such snippets.yml <-- scripts for one off commands command_lib.py <-- Command Library modules + + ▾ extensions/ <-- This is the extension plugin library. An extension is an external tool Tern can use to analyze the filesystems in the container image + executor.py <-- This is the abstract base class that an extension plugin needs to inherit from + ▸ cve_bin_tool/ + executor.py + ▾ scancode/ + executor.py + ▾ formats/ <-- This is the reporting template plugin library. Each subdirectory is a module that can be dynamically loaded at runtime based on the users report selection generator.py <-- This is the abstract base class for report plugins ▾ default/ @@ -41,25 +53,29 @@ Here is a layout of the directory structure: generator.py ▾ yaml/ generator.py + ▾ report/ content.py errors.py formats.py report.py <-- Main reporting module + ▾ scripts/debian/ <-- Example script to pull sources for debian based images ▾ jessie/ sources.list apt_get_sources.sh + ▾ tools/ <-- Tools that can be used individually or by Tern fs_hash.sh verify_invoke.py + container_debug.py + ▾ utils/ <-- general utility modules used throughout the code cache.py constants.py general.py metadata.py rootfs.py - __main__.py <-- Tern entry point ``` Tests live outside of the `tern` folder, in a folder called `tests`. @@ -104,4 +120,6 @@ Some general rules about where the code is located: Code organization follows these general rules: - Each class is in its own file. - Utils are organized based on what they operate on. -- Subroutines that require the use of modules from all over the project live under the helper folder in high level files like `common.py` and `docker.py` +- Subroutines for use in general or under a particular "domain" can live in a default file name called `helpers.py`. You will find plenty of `helpers.py` files within the `analyze` folder. + +[Back to the README](../README.md) diff --git a/docs/project-roadmap.md b/docs/project-roadmap.md index 48fbd804..c20d9350 100644 --- a/docs/project-roadmap.md +++ b/docs/project-roadmap.md @@ -25,3 +25,5 @@ For Release 1.0.0 slated for November of 2019, we will focus on the following: This timetable is based on time, resources and feedback from you and will change accordingly. See archived roadmaps [here](project-roadmap-archive.md) + +[Back to the README](../README.md) diff --git a/docs/spdx-tag-value-mapping.md b/docs/spdx-tag-value-mapping.md index 38bd796c..348cf79a 100644 --- a/docs/spdx-tag-value-mapping.md +++ b/docs/spdx-tag-value-mapping.md @@ -149,3 +149,5 @@ If the SPDX document includes a File element for the Dockerfile being analyzed b * `SPDXRef-Dockerfile-1 METAFILE_OF SPDXRef-Image-1`: the Dockerfile is a metadata file that describes the Image If additional relationships would be useful beyond those currently specified by SPDX, there is an `OTHER` option, or additional relationship types could be proposed for inclusion in the SPDX specification. + +[Back to the README](../README.md) diff --git a/docs/spdx-tag-value-overview.md b/docs/spdx-tag-value-overview.md index acbc9afc..c58b1283 100644 --- a/docs/spdx-tag-value-overview.md +++ b/docs/spdx-tag-value-overview.md @@ -98,3 +98,5 @@ SPDXID: SPDXRef-DOCUMENT ``` Packages also must define a unique SPDX identifier, which must start with `SPDXRef-` followed by any unique combination of alphanumeric characters, `.` or `-`. An identifier might incorporate a human-relevant name for the package (e.g. `SPDXRef-requests`), or alternatively Package identifiers might just be sequential numbers (e.g. `SPDXRef-1`, `SPDXRef-2`, ...). + +[Back to the README](../README.md)