Skip to content

Commit

Permalink
[BE] consolidate 4-GPU integration tests into 8-GPU tests and reduce …
Browse files Browse the repository at this point in the history
…frequency

[ghstack-poisoned]
  • Loading branch information
tianyu-l committed Dec 18, 2024
1 parent 7c4e103 commit d465b9b
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 55 deletions.
46 changes: 0 additions & 46 deletions .github/workflows/integration_test_4gpu.yaml

This file was deleted.

10 changes: 7 additions & 3 deletions .github/workflows/integration_test_8gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ on:
branches: [ main ]
pull_request:
schedule:
# Runs nightly
- cron: '0 0 * * *'
# Runs every 6 hours
- cron: '0 */6 * * *'
concurrency:
group: unit-test${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_number || github.ref }}
cancel-in-progress: true
Expand All @@ -21,7 +21,7 @@ jobs:
with:
runner: linux.g5.48xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
gpu-arch-version: "12.4"
# This image is faster to clone than the default, but it lacks CC needed by triton
# (1m25s vs 2m37s).
docker-image: torchtitan-ubuntu-20.04-clang12
Expand All @@ -37,5 +37,9 @@ jobs:
pip config --user set global.progress_bar off
python -m pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
# install torchtitan to test the files in ./scripts
python -m pip install -e .
mkdir artifacts-to-be-uploaded
python ./tests/integration_tests.py artifacts-to-be-uploaded --ngpu 8
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
[![4 GPU Integration Test](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_4gpu.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_4gpu.yaml?query=branch%3Amain)
[![8 GPU Integration Test](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu.yaml?query=branch%3Amain)

# torchtitan
Expand Down
6 changes: 1 addition & 5 deletions tests/integration_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -471,10 +471,6 @@ def run_tests(args):
f"Skipping test {test_flavor.test_name} that requires {test_flavor.ngpu} gpus,"
f" because --ngpu arg is {args.ngpu}"
)
elif args.ngpu == 8 and test_flavor.ngpu != 8:
logger.info(
f"Skipping non-8gpu test {test_flavor.test_name} on 8-gpu runner"
)
else:
run_test(test_flavor, full_path, args.output_dir)

Expand All @@ -488,7 +484,7 @@ def main():
default="all",
help="test to run, acceptable values: `test_name` in `build_test_list` (default: all)",
)
parser.add_argument("--ngpu", default=4, type=int)
parser.add_argument("--ngpu", default=8, type=int)
args = parser.parse_args()

if not os.path.exists(args.output_dir):
Expand Down

0 comments on commit d465b9b

Please sign in to comment.