Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reduce logging level in sapphire-localnet Docker image #513

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

abukosek
Copy link
Contributor

@abukosek abukosek commented Jan 25, 2024

This PR adds the following changes to the sapphire-localnet Docker image:

  • Build oasis-node and oasis-net-runner from the master branch (needed until oasisprotocol/oasis-core@522aeda ends up in a stable release).
  • Reduce size of built simple-keymanager binary from 5MB to 3MB (this is accomplished by tweaking Rust build options).
  • Add env var to specify the log level of spawned nodes (OASIS_NODE_LOG_LEVEL).
  • Reduce log level of all spawned nodes from debug to warn, which should result in a much much lower disk consumption.

TODO:

  • Measure disk space usage by logs at different log levels (debug, info, warn).
  • Backport the net runner log level changes to the stable/23.0.x branch.

@abukosek abukosek force-pushed the andrej/feature/sapphire-localnet-docker-improvements branch from 29c114a to b7b5e07 Compare January 25, 2024 14:02
Copy link

codecov bot commented Jan 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b9b8d38) 62.31% compared to head (a2f603a) 62.31%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #513   +/-   ##
=======================================
  Coverage   62.31%   62.31%           
=======================================
  Files          38       38           
  Lines        3962     3962           
=======================================
  Hits         2469     2469           
  Misses       1285     1285           
  Partials      208      208           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@abukosek abukosek marked this pull request as ready for review January 25, 2024 14:10
@@ -67,7 +67,7 @@ fi
T_START="$(date +%s)"

notice "Starting oasis-net-runner with ${CYAN}${PARATIME_NAME}${OFF}...\n"
/spinup-oasis-stack.sh --log.level info 2>1 &>/var/log/spinup-oasis-stack.log &
/spinup-oasis-stack.sh --log.level warn 2>1 &>/var/log/spinup-oasis-stack.log &
Copy link
Member

@ptrus ptrus Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made configurable via an env var? Debug logs are really useful when investigating issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll add that :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd like at least info by default? I'd actually prefer debug, as i'm one of those that gets to investigate issues from time to time, and not having debug is kinda pain/useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation behind this change was to reduce disk and CPU usage of the container, so that developers don't need to buy a 1TB SSD just to store some logs that they're not going to look at 99.9% of the time :)
If there's an issue, the env var that I'm going to add soon can be changed to debug and the developers can re-run their stuff and look at the logs.
Is this container used elsewhere where this kind of process isn't suitable?

Copy link
Member

@ptrus ptrus Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

that developers don't need to buy a 1TB SSD just to store some logs that they're not going to look at 99.9% of the time

Yeah so if this is the main/only focus then logrotating the logs would work as well. Lets see what the numbers are with debug/info/war logs. Interesting to see both the absolute usage of logs, and % to the rest of container disk usage.

My thinking is that if with info one is able to run the container, for lets say two weeks, without logs consuming to much disk space (both in % and absolute wise) then I'm fine with setting the default to info and leaving it as it is, otherwise we should likely additionally implement some kind of logrotate to limit the log sizes.

Is this container used elsewhere where this kind of process isn't suitable?

Probably not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I will make some measurements and we can decide then what to do :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@ptrus
Copy link
Member

ptrus commented Jan 25, 2024

which should result in a much much lower disk consumption.

Do we have some numbers for this? Whats the approximate/estimated disk space usage for the container after 1 hour (or 10 minutes) in different settings, e.g. warn/info/debug?

Does this also reduce CPU usage, I thought reducing CPU usage was the initial motivation for this, but I kinda doubted it would be noticeable, although I could be wrong.

If the only reason for doing this is reducing disk usage, we could also think about truncating logs (via logrotate or something).

@abukosek
Copy link
Contributor Author

Do we have some numbers for this?

Not yet, but I am planning to measure it.

@abukosek abukosek force-pushed the andrej/feature/sapphire-localnet-docker-improvements branch from b7b5e07 to a091242 Compare January 25, 2024 15:02
@@ -8,11 +8,15 @@ RUN cd oasis-web3-gateway && make && strip -S -x oasis-web3-gateway docker/commo
FROM ghcr.io/oasisprotocol/oasis-core-dev:stable-23.0.x AS oasis-core-dev

ENV OASIS_UNSAFE_SKIP_KM_POLICY=1
ENV OASIS_CORE_VERSION=master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to backport the oasis-core fix into stable/23.0.x branch now (even if there's no release yet), and build oasis-core from there? Since master could technically get breaking changes merged in which could cause things to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the best solution for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in oasisprotocol/oasis-core#5547 -- will change the branch now.

@CedarMist
Copy link
Member

CedarMist commented Jan 29, 2024

which should result in a much much lower disk consumption.

Do we have some numbers for this? Whats the approximate/estimated disk space usage for the container after 1 hour (or 10 minutes) in different settings, e.g. warn/info/debug?

With the latest sapphire-localnet container with debug logging it's using approx 8mb per minute of disk space for logging, it's slowly thrashing my SSDs. That's about 11gb per day... 300gb/month etc. This makes sense as I've ran out of space on my docker partition a couple of times while the container was left running for a couple of weeks.

There's a lot of node gossip being logged, meaning it's very difficult to extract from the logs anything of relevance when something weird happens.

@abukosek abukosek force-pushed the andrej/feature/sapphire-localnet-docker-improvements branch from a091242 to 6b7663c Compare January 29, 2024 12:23
@ptrus
Copy link
Member

ptrus commented Jan 29, 2024

With the latest sapphire-localnet container with debug logging it's using approx 8mb per minute of disk space

Yeah, lets default to info or warn, depending on what the difference in usage between info and warn is.

@abukosek
Copy link
Contributor Author

As promised, the graphs (this is just for an idling sapphire-localnet container -- if there are any transactions, logs will grow even more):
Versions: sapphire-localnet local (oasis-core: 23.0-gitd66f409, sapphire-paratime: 0.7.1-testnet, oasis-web3-gateway: 5.0.0-rc1-git64ecd67+dirty)

pct-usage-by-logs
disk-usage-by-logs

I'd prefer to set warn as the default, the user can always run the container with -e OASIS_NODE_LOG_LEVEL=info or -e OASIS_NODE_LOG_LEVEL=debug to get more verbose logs :)

@ptrus
Copy link
Member

ptrus commented Jan 30, 2024

Ok agreed, thanks for the reports!

@abukosek abukosek force-pushed the andrej/feature/sapphire-localnet-docker-improvements branch from 6b7663c to a2f603a Compare January 30, 2024 14:32
@abukosek abukosek merged commit 1c3f6a2 into main Jan 30, 2024
8 checks passed
@abukosek abukosek deleted the andrej/feature/sapphire-localnet-docker-improvements branch January 30, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants