Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile-default-rocm split into two separate files mainly aimed at different users. #272

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

will-HPE
Copy link
Contributor

Description

Checklist

  • [ ✓] Bump VERSION to make the pushed images are tagged with the right version.
  • Licenses should be included for new code which was copied and/or modified from any external code.
  • Test the images by running the test bumpenvs procedure in the determined repo. See README.

@will-HPE will-HPE requested a review from a team as a code owner July 16, 2024 18:17
@will-HPE will-HPE requested a review from jgongd July 16, 2024 18:17
@cla-bot cla-bot bot added the cla-signed label Jul 16, 2024
ARG BASE_IMAGE
FROM ${BASE_IMAGE}

# MAY NOT BE IMPORTANT ANYMORE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can remove it and if something breaks then we add it back in.



# THIS FIX IS FOR SAWMILL, UNCLEAR IF NECESSARY FOR GENERAL USERS
#TODO: is this necessary?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

@@ -0,0 +1,55 @@
ARG BASE_IMAGE
FROM ${BASE_IMAGE}
#why no highlighting?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment


# LIBFABRIC ISSUE
# USE CONDA FOR WORKAROUND
#TODO: MAY NOT BE A PROBLEM ANYMORE?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

#TODO: finish iterating here, preferably turn it into a shell script.
RUN if [ -n "$DEEPSPEED_PIP" ]; then DEBIAN_FRONTEND=noninteractive apt-get install -y pdsh libaio-dev&& git clone https://github.com/ROCmSoftwarePlatform/triton.git && cd triton && git checkout triton-mlir && cd python && pip3 install ninja cmake && python setup.py install;fi
RUN if [ -n "$DEEPSPEED_PIP" ]; then DEBIAN_FRONTEND=noninteractive apt-get install -y pdsh libaio-dev&& python -m pip install pydantic==1.10.11 && git clone https://github.com/ROCmSoftwarePlatform/DeepSpeed.git && cd DeepSpeed && python3 setup.py build && python3 setup.py install && python -m deepspeed.env_report; fi
RUN if [ -n "$DEEPSPEED_PIP" ]; then python -m deepspeed.env_report ; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deepspeed section definitely needs cleanup

INFINITYHUB_PYTORCH_PREFIX := rocm/pytorch
INFINITYHUB_TENSORFLOW_PREFIX := rocm/tensorflow
INFINITYHUB_PYTORCH_VERSION := 2.1.2
INFINITYHUB_TENSORFLOW_VERSION :=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

Copy link
Contributor

@MikhailKardash MikhailKardash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs CI build integration as well, so I expect there to be .circleci code changes.

ROCM_57_PREFIX := $(REGISTRY_REPO):rocm-5.7-
ROCM_60_PREFIX := $(REGISTRY_REPO):rocm-6.0-
ROCM_61_PREFIX := $(REGISTRY_REPO):rocm-6.1-
ROCM_60_TF_PREFIX := tensorflow-infinity-hub:tensorflow-infinity-hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why other images are stored in REGISTRY_REPO := environments, or a repo with a -dev suffix, but this one is not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we test these images?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants