-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pad 134 #254
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
@@ -10,7 +10,7 @@ Pillow>=8.3.2,<=9.5.0 | |||
analytics-python | |||
nvidia-ml-py | |||
protobuf<=3.20.3 | |||
tensorboard==2.10.1 | |||
tensorboard==1.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold up, this can't be right, this is a tensorboard version for tensorflow v1 from 2019.
can we have a version that is more modern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test a different version but 2.10.1 fails on end-to-end testing and was imortalized in jira PAD-91:
https://hpe-aiatscale.atlassian.net/browse/PAD-91
103_run_mlde_validation_suite_against_rocm_on_grenoble/determined-...
tests/nightly/test_pytorch2.py::test_pytorch2_hf_language_modeling_distributed FAILED [100%]
The reason for the failure is:
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a strong opinion we cannot pin 1.15 here because it's 5 years old and likely to conflict with other dependencies and have CVEs.
2.10.1 is also technically above 1.15...
VERSION
Outdated
@@ -1 +1 @@ | |||
0.30.1 | |||
0.31.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've just released 0.30.0; if it'll land (and get into bumpenvs) by EOD today, you can keep it at 0.30.1. otherwise it'll probably be 0.30.2
Tensorboard and GPU kernel build fixes (PAD-91 and PAD-134)
Checklist
bumpenvs
procedure in the determined repo. See README.