Prometheus exporter for Lustre metrics.
Lustre Version | Exporter Tag |
---|---|
2.12 | v2.1.6 |
Clone the repository.
For just building the exporter:
make build
Building the exporter with code testing, formatting and linting:
make
Required for build with code linting:
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
A CentOS7 RPM package can be build by using a
- Shell build script in
rpm/build.sh
- Docker build container, see section below.
A Debian container is based on the offical golang:1.17.5-bullseye container image for just providing the binary.
# from repo base dir run
sudo docker build --tag lustre_exporter -f docker/Dockerfile .
sudo docker run --rm -v $PWD:/cpy -it lustre_exporter
The binary will be available in the repositories build
directory.
A CentOS7 container is based on the official CentOS7 container image for providing the exporter in a RPM package.
# from repo base dir run
sudo docker build -t rpm_dock -f docker/RPM-Dockerfile .
sudo docker run --rm -v $PWD:/rpm -it rpm_dock
The RPM package will be available in the repositories build
directory.
./lustre_exporter <flags>
- collector.ost=disabled/core/extended
- collector.mdt=disabled/core/extended
- collector.mgs=disabled/core/extended
- collector.mds=disabled/core/extended
- collector.client=disabled/core/extended
- collector.generic=disabled/core/extended
- collector.lnet=disabled/core/extended
- collector.health=disabled/core/extended
All above flags default to the value "extended" when no argument is submitted by the user.
Example: ./lustre_exporter --collector.ost=disabled --collector.mdt=core --collector.mgs=extended
The above example will result in a running instance of the Lustre Exporter with the following statuses:
- collector.ost=disabled
- collector.mdt=core
- collector.mgs=extended
- collector.mds=extended
- collector.client=extended
- collector.generic=extended
- collector.lnet=extended
- collector.health=extended
Flag Option Detailed Description
- disabled - Completely disable all metrics for this portion of a source.
- core - Enable this source, but only for metrics considered to be particularly useful.
- extended - Enable this source and include all metrics that the Lustre Exporter is aware of within it.
All Lustre procfs and procsys data from all nodes running the Lustre Exporter that we perceive as valuable data is exported or can be added to be exported (we don't have any known major gaps that anyone cares about, so if you see something missing, please file an issue!).
See the issues tab for all known issues.
Since the LNET metrics are currently exported based on the DebugFS it is required to have root priviliges or mount DebugFS to be access by different user.
To export that metrics running the exporter as Systemd service as root
user, the default user option prometheus
must be changed in the service file.
Example dashboards for Grafana are provided under the dashboards directroy.
It is highly recommended to also install the Prometheus Node Exporter to get a complete view within the provided dashboards.
In the event that you encounter issues with specific metrics (especially on versions of Lustre older than 2.12), please try disabling those specific troublesome metrics using the documented collector flags in the 'disabled' or 'core' state. Users have encountered bugs within Lustre where specific sysfs and procfs files miscommunicate their sizes, causing read calls to fail.
You are welcome to contribute to the project. Feel free to create an issue, pull request or just start a discussion.