cosmos-node-exporter is a Prometheus scraper that scrapes some data to monitor your node. It exposes the following metrics:
- node status (voting power, whether the node is catching up or is stuck behind the blockchain)
- app version (local binary, latest GitHub/Gitopia release and if you are running the latest version)
- Cosmovisor metrics (version of Cosmovisor version itself)
- upgrades metrics (time till upgrade, upgrade version, if you have a binary prepared for the upgrade)
- chain metrics (cosmos-sdk version, Tendermint/CometBFT version, Go version/build tags)
- node params (minimum-gas-prices)
Specifically, if you are a validator or a node operator, you can set up alerting if:
- your app version does not match the latest on GitHub (can be useful to be notified on new releases)
- your voting power is 0 for a validator node
- your node is catching up
- there are chain upgrades your node does not have binaries for
- there's an upgrade coming soon
First, you need to download the latest release from the releases page. After that, you should unzip it, and you are ready to go:
wget <the link from the releases page>
tar <the filename you've just downloaded>
./cosmos-node-exporter <params>
Alternatively, install golang
(>1.18), clone the repo and build it:
git clone https://github.com/QuokkaStake/cosmos-node-exporter
cd cosmos-node-exporter
# This will generate a `cosmos-node-exporter` binary file in the repository folder
make build
# This will generate a `missed-blocks-checker` binary file in $GOPATH/bin
To run it in detached mode in background, first, we have to copy the file to the system apps folder:
sudo cp ./cosmos-node-exporter /usr/bin
Then we need to create a systemd service for our app:
sudo nano /etc/systemd/system/cosmos-node-exporter.service
You can use this template (change the user to whatever user you want this to be executed from. It's advised to create a separate user for that instead of running it from root):
[Unit]
Description=Cosmos Node Exporter
After=network-online.target
[Service]
User=<username>
TimeoutStartSec=0
CPUWeight=95
IOWeight=95
ExecStart=cosmos-node-exporter --config <path to config>
Restart=always
RestartSec=2
LimitNOFILE=800000
KillSignal=SIGTERM
[Install]
WantedBy=multi-user.target
Then we'll add this service to autostart and run it:
sudo systemctl daemon-reload # reflect changes in systemd files
sudo systemctl enable cosmos-node-exporter # enable service autostart
sudo systemctl start cosmos-node-exporter # start a service
sudo systemctl status cosmos-node-exporter # validate it's running
If you need to, you can also see the logs of the process:
sudo journalctl -u cosmos-node-exporter -f --output cat
Here's the example of the Prometheus config you can use for scraping data:
scrape-configs:
- job_name: 'cosmos-node-exporter'
scrape_interval: 10s
static_configs:
- targets: ['<your IP>:9500']
Then restart Prometheus and you're good to go!
Well, here's the app schema:
Sounds complex, huh? Let us explain.
We built this exporter to be as modular as possible, so it'd be easy to add new data fetching and new metrics. Here's some terms we use within the app:
Fetcher
- an entity that fetches data from remote source (like RPC node); it may require some data from other fetchersController
- an entity that fetches all the data from all provided Fetchers and generates a StateState
- an entity that represents an eventual result of all Fetchers executionGenerator
- an entity that generates some metrics based on State entityNodeHandler
- an entity that fetches data and generates metrics for a specific nodeApp
- an entity that spawns a bunch of NodeHandlers per each chain, then assembles and returns metrics to a user
This allows to build complex schemas (like, we don't need to fetch block time to calculate time till upgrade if there's no upgrade upcoming) and make it flexible and easy to add new Fetchers and Generators.
Fetchers can also be enabled/disabled, if a Fetcher is disabled, then it will provide no data and therefore Generator that uses the data from that Fetcher won't provide any metrics.
Here's a list of Generators:
Generator | Metrics returned | Per-node? | Requirements |
---|---|---|---|
AppVersionGenerator | cosmos-node-exporter version | No | |
UptimeGenerator | App launch timestamp, useful for annotations | No | |
CosmovisorUpgradesGenerator | Whether the Cosmovisor binary is present for the upgrade | Yes | Cosmovisor config and the upcoming upgrade |
CosmovisorVersionGenerator | Cosmovisor version | Yes | Cosmovisor config |
IsLatestGenerator | Whether the local version is the same or greater than the latest GitHub/Gitopia release | Yes | Cosmovisor config (for local version), Git config (for fetching remote version) |
LocalVersionGenerator | Local app binary version | Yes | Cosmovisor config |
NodeConfigGenerator | Node's minimum-gas-prices and halt-height | Yes | gRPC config, the chain should implement the cosmos.base.node.v1beta1/Config gRPC endpoint. |
NodeInfoGenerator | Running app version/git tag, cosmos-sdk version, Go version/build tags used to build it | Yes | gRPC config |
NodeStatusGenerator | Node's voting power, sync status, latest block time, node info, Tendermint/CometBFT version | Yes | Tendermint/CometBFT config |
RemoteVersionGenerator | Latest release of this app published | Yes | Git config (either Git or Gitopia) |
TimeTillUpgradeGenerator | Estimated upgrade time | Yes | Tendermint/CometBFT config (for fetching upgrade plan and block time) |
UpgradesGenerator | Upcoming upgrade info | Yes | Tendermint/CometBFT config |
Additionally, per each Fetcher, the app will return the list of actions it did (like, querying a node, getting GitHub latest release etc.)
and whether they were successful a node. The exporter itself should never return an error or crash (if it does, please file an issue),
instead it will return all the data it could get, and additionally it'll return a metrics set with all the actions it could
or couldn't do. You can set alerts based on that, for example, if node_status
action is failing for a big period of time,
likely the node is down.
All metrics are prefixed with cosmos_node_exporter_
, to get the list of all metrics, try something like
curl localhost:9500/metrics
on a fullnode the binary is running at and look at the results.
All configuration is done via .toml
config. Check config.example.toml for reference.
Bug reports and feature requests are always welcome! If you want to contribute, feel free to open issues or PRs.