-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From NVIDIA Megatron-LM for visibility #18
base: multi-query-attention
Are you sure you want to change the base?
Commits on Sep 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 08e245d - Browse repository at this point
Copy the full SHA 08e245dView commit details -
Merge branch 'mblaz/fast-load-broadcast' into 'main'
Optimize broadcasted data during parallel load See merge request ADLR/megatron-lm!1968
Configuration menu - View commit details
-
Copy full SHA for 5b73de7 - Browse repository at this point
Copy the full SHA 5b73de7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6701e08 - Browse repository at this point
Copy the full SHA 6701e08View commit details -
Merge branch 'dnarayanan/distributed_optimizer_readme_fixes' into 'main'
Fix description of distributed optimizer workflow See merge request ADLR/megatron-lm!1951
Configuration menu - View commit details
-
Copy full SHA for 3396356 - Browse repository at this point
Copy the full SHA 3396356View commit details -
Configuration menu - View commit details
-
Copy full SHA for 033d8b0 - Browse repository at this point
Copy the full SHA 033d8b0View commit details -
Merge branch 'kunlunl/native_fp8_2' into 'main'
Add native-fp8 See merge request ADLR/megatron-lm!1669
Configuration menu - View commit details
-
Copy full SHA for 01945b9 - Browse repository at this point
Copy the full SHA 01945b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for f0161d2 - Browse repository at this point
Copy the full SHA f0161d2View commit details -
Merge branch 'mblaz/dist-ckpt-pyt2.4' into 'main'
Restore the actual PyT 2.4 fix from !1970 See merge request ADLR/megatron-lm!2039
Configuration menu - View commit details
-
Copy full SHA for 7580748 - Browse repository at this point
Copy the full SHA 7580748View commit details -
Configuration menu - View commit details
-
Copy full SHA for a61150d - Browse repository at this point
Copy the full SHA a61150dView commit details -
Merge branch 'ko3n1g/tests/disable-mamba-test' into 'main'
tests: Skip flaky mamba test See merge request ADLR/megatron-lm!2044
Configuration menu - View commit details
-
Copy full SHA for 2169674 - Browse repository at this point
Copy the full SHA 2169674View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb979cf - Browse repository at this point
Copy the full SHA cb979cfView commit details -
Merge branch 'ko3n1g/ci/bump-sha' into 'main'
ci: Bump reference sha See merge request ADLR/megatron-lm!2048
Configuration menu - View commit details
-
Copy full SHA for 38873f5 - Browse repository at this point
Copy the full SHA 38873f5View commit details -
ADLR/megatron-lm!2029 - Add model config files for Mixtral-8x7B and M…
…ixtral-8x22B performance benchmarking
Configuration menu - View commit details
-
Copy full SHA for 7ef8b3f - Browse repository at this point
Copy the full SHA 7ef8b3fView commit details -
Merge branch 'xuwenc/release_moe_benchmarking' into 'main'
Add model config files for Mixtral-8x7B and Mixtral-8x22B performance benchmarking See merge request ADLR/megatron-lm!2029
Configuration menu - View commit details
-
Copy full SHA for 5ec1e29 - Browse repository at this point
Copy the full SHA 5ec1e29View commit details -
ADLR/megatron-lm!1881 - Uneven Pipeline Parallelism
Co-authored-by: William Dykas <wdykas@cw-dfw-cs-001-dc-02.cm.cluster> Co-authored-by: William Dykas <wdykas@cw-dfw-cs-001-dc-01.cm.cluster> Co-authored-by: William Dykas <wdykas@cs-cw-dfw-login-01.cm.cluster> Co-authored-by: William Dykas <wdykas@cs-cw-dfw-dc-02.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for fa8bb59 - Browse repository at this point
Copy the full SHA fa8bb59View commit details -
Merge branch 'uneven-pipeline' into 'main'
Uneven Pipeline Parallelism See merge request ADLR/megatron-lm!1881
Configuration menu - View commit details
-
Copy full SHA for 60d03fd - Browse repository at this point
Copy the full SHA 60d03fdView commit details -
ADLR/megatron-lm!1912 - Add support for pytorch tensorboard profiler
Co-authored-by: Jon Barker <jbarker@draco-oci-dc-01.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for 86df799 - Browse repository at this point
Copy the full SHA 86df799View commit details -
Merge branch 'jbarker/pt-profiler' into 'main'
Add support for pytorch tensorboard profiler See merge request ADLR/megatron-lm!1912
Configuration menu - View commit details
-
Copy full SHA for cb4ce23 - Browse repository at this point
Copy the full SHA cb4ce23View commit details -
Configuration menu - View commit details
-
Copy full SHA for dd876ba - Browse repository at this point
Copy the full SHA dd876baView commit details -
Merge branch 'ko3n1g/tests/release-training-load-path' into 'main'
ci: Pass `LOAD_PATH` into training See merge request ADLR/megatron-lm!2050
Configuration menu - View commit details
-
Copy full SHA for 4a756e2 - Browse repository at this point
Copy the full SHA 4a756e2View commit details
Commits on Sep 6, 2024
-
ADLR/megatron-lm!1958 - Update check_param_hashes_across_dp_replicas …
…to return true if hashes across all DP ranks match.
Configuration menu - View commit details
-
Copy full SHA for 8f19bcd - Browse repository at this point
Copy the full SHA 8f19bcdView commit details -
Merge branch 'akoumparouli/check_param_hashes_across_dp_replicas_fix'…
… into 'main' Update check_param_hashes_across_dp_replicas to return true if hashes across all DP ranks match. See merge request ADLR/megatron-lm!1958
Configuration menu - View commit details
-
Copy full SHA for 732a689 - Browse repository at this point
Copy the full SHA 732a689View commit details -
ADLR/megatron-lm!1796 - Per layer cudagraph support for GPT training …
…with Transformer Engine modules
Configuration menu - View commit details
-
Copy full SHA for 43ee4b8 - Browse repository at this point
Copy the full SHA 43ee4b8View commit details -
Merge branch 'auto_cudagraph' into 'main'
Per layer cudagraph support for GPT training with Transformer Engine modules See merge request ADLR/megatron-lm!1796
Configuration menu - View commit details
-
Copy full SHA for 9366f3c - Browse repository at this point
Copy the full SHA 9366f3cView commit details -
ADLR/megatron-lm!2053 - Update model config files for Mixtral-8x7B an…
…d Mixtral-8x22B performance benchmarking
Configuration menu - View commit details
-
Copy full SHA for 8499f26 - Browse repository at this point
Copy the full SHA 8499f26View commit details -
Merge branch 'xuwenc/release_moe_benchmarking' into 'main'
Update model config files for Mixtral-8x7B and Mixtral-8x22B performance benchmarking See merge request ADLR/megatron-lm!2053
Configuration menu - View commit details
-
Copy full SHA for 3728c67 - Browse repository at this point
Copy the full SHA 3728c67View commit details -
ADLR/megatron-lm!1971 - Revert "ADLR/megatron-lm!1747 - Use TP-CP gro…
…up for fp8 amax reduction"
Configuration menu - View commit details
-
Copy full SHA for 98abe37 - Browse repository at this point
Copy the full SHA 98abe37View commit details -
Merge branch 'amax_red' into 'main'
Revert "ADLR/megatron-lm!1747 - Use TP-CP group for fp8 amax reduction" See merge request ADLR/megatron-lm!1971
Configuration menu - View commit details
-
Copy full SHA for a2b6ee4 - Browse repository at this point
Copy the full SHA a2b6ee4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8f331e8 - Browse repository at this point
Copy the full SHA 8f331e8View commit details -
Merge branch 'denliu/fp8_moe' into 'main'
FP8 support for MoE with conservative recipe Closes #43 See merge request ADLR/megatron-lm!1089
Configuration menu - View commit details
-
Copy full SHA for cc16182 - Browse repository at this point
Copy the full SHA cc16182View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a0e78d - Browse repository at this point
Copy the full SHA 9a0e78dView commit details -
Merge branch 'mblaz/fix-deprecation-notice' into 'main'
Fix `zarr` deprecation notice See merge request ADLR/megatron-lm!2042
Configuration menu - View commit details
-
Copy full SHA for 7a113e7 - Browse repository at this point
Copy the full SHA 7a113e7View commit details
Commits on Sep 7, 2024
-
ADLR/megatron-lm!1859 - Skierat/fully parallel local
Co-authored-by: Mikołaj Błaż <mblaz@nvidia.com> Co-authored-by: Slawek Kierat <skierat@skierat-mlt.client.nvidia.com> Co-authored-by: Jakub Szulc <jszulc@nvidia.com> Co-authored-by: Slawomir Kierat <skierat@dgx1v-loki-25.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 3fb5c51 - Browse repository at this point
Copy the full SHA 3fb5c51View commit details -
Merge branch 'skierat/fully-parallel-local' into 'main'
Skierat/fully parallel local See merge request ADLR/megatron-lm!1859
Configuration menu - View commit details
-
Copy full SHA for 8252432 - Browse repository at this point
Copy the full SHA 8252432View commit details -
ADLR/megatron-lm!1630 - Runtime upcycling support for MoE
Co-authored-by: Zijie Yan <zijiey@nvidia.com> Co-authored-by: Abhinav Khattar <akhattar@nvidia.com> Co-authored-by: Ethan He <yihuih@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 6c3ada7 - Browse repository at this point
Copy the full SHA 6c3ada7View commit details -
Merge branch 'runtime-upcycling' into 'main'
Runtime upcycling support for MoE See merge request ADLR/megatron-lm!1630
Configuration menu - View commit details
-
Copy full SHA for f5667db - Browse repository at this point
Copy the full SHA f5667dbView commit details -
ADLR/megatron-lm!2052 - updates import for fault_tolerance package to…
… nvidia_resiliency_ext.fault_tolerance
Configuration menu - View commit details
-
Copy full SHA for 80e3863 - Browse repository at this point
Copy the full SHA 80e3863View commit details -
Merge branch 'nvidia_resiliency_ext' into 'main'
updates import for fault_tolerance package to nvidia_resiliency_ext.fault_tolerance See merge request ADLR/megatron-lm!2052
Configuration menu - View commit details
-
Copy full SHA for 5019bb4 - Browse repository at this point
Copy the full SHA 5019bb4View commit details -
Configuration menu - View commit details
-
Copy full SHA for c14d987 - Browse repository at this point
Copy the full SHA c14d987View commit details -
Merge branch 'ko3n1g/tests/fix-mixtral-tests' into 'main'
tests: Move mixtral locations See merge request ADLR/megatron-lm!2056
Configuration menu - View commit details
-
Copy full SHA for 79b448a - Browse repository at this point
Copy the full SHA 79b448aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7053e64 - Browse repository at this point
Copy the full SHA 7053e64View commit details -
Merge branch 'ko3n1g/ci/bump-sha-2' into 'main'
ci: Bump sha See merge request ADLR/megatron-lm!2055
Configuration menu - View commit details
-
Copy full SHA for c2d7e2f - Browse repository at this point
Copy the full SHA c2d7e2fView commit details -
ADLR/megatron-lm!1926 - Adding T5 release test
Co-authored-by: Huy Vu <huvu@cs-oci-ord-login-01.cm.cluster> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 759d787 - Browse repository at this point
Copy the full SHA 759d787View commit details -
Merge branch 'huvu/mcore_t5_release_test' into 'main'
Adding T5 release test See merge request ADLR/megatron-lm!1926
Configuration menu - View commit details
-
Copy full SHA for 6c49616 - Browse repository at this point
Copy the full SHA 6c49616View commit details -
ADLR/megatron-lm!1990 - Mitigate slow loops in set_is_first_minibatch…
… and zero_grad_buffers Co-authored-by: Jon Barker <jbarker@draco-oci-dc-01.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for ab5624b - Browse repository at this point
Copy the full SHA ab5624bView commit details -
Merge branch 'jbarker/remove_the_bad_loops' into 'main'
Mitigate slow loops in set_is_first_minibatch and zero_grad_buffers See merge request ADLR/megatron-lm!1990
Configuration menu - View commit details
-
Copy full SHA for cb42680 - Browse repository at this point
Copy the full SHA cb42680View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7adc86e - Browse repository at this point
Copy the full SHA 7adc86eView commit details -
Merge branch 'dnarayanan-main-patch-45127' into 'main'
Fix bug in docstrings in `megatron/core/parallel_state.py` See merge request ADLR/megatron-lm!1882
Configuration menu - View commit details
-
Copy full SHA for 5747146 - Browse repository at this point
Copy the full SHA 5747146View commit details -
ADLR/megatron-lm!1975 - Refactor distributed optimizer communication …
…code into megatron/core/distributed
Configuration menu - View commit details
-
Copy full SHA for 655a663 - Browse repository at this point
Copy the full SHA 655a663View commit details -
Merge branch 'dnarayanan/dist_optimizer_refactor' into 'main'
Refactor distributed optimizer communication code into megatron/core/distributed See merge request ADLR/megatron-lm!1975
Configuration menu - View commit details
-
Copy full SHA for 6ac4db0 - Browse repository at this point
Copy the full SHA 6ac4db0View commit details
Commits on Sep 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8d62160 - Browse repository at this point
Copy the full SHA 8d62160View commit details -
Merge branch 'ko3n1g/ci/maybe-cherry-pick-commit' into 'main'
ci: Automated cherry-picking See merge request ADLR/megatron-lm!2046
Configuration menu - View commit details
-
Copy full SHA for 8b0a9b3 - Browse repository at this point
Copy the full SHA 8b0a9b3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 56ddf9a - Browse repository at this point
Copy the full SHA 56ddf9aView commit details -
Merge branch 'bump-sha-3' into 'main'
ci: Bump sha See merge request ADLR/megatron-lm!2060
Configuration menu - View commit details
-
Copy full SHA for 8e21350 - Browse repository at this point
Copy the full SHA 8e21350View commit details -
Configuration menu - View commit details
-
Copy full SHA for a604c95 - Browse repository at this point
Copy the full SHA a604c95View commit details -
Merge branch 'ko3n1g/ci/allow-skipping-unittests' into 'main'
ci: Allow skipping unit tests See merge request ADLR/megatron-lm!2061
Configuration menu - View commit details
-
Copy full SHA for 46b850f - Browse repository at this point
Copy the full SHA 46b850fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a47180 - Browse repository at this point
Copy the full SHA 4a47180View commit details -
Merge branch 'ko3n1g/ci/automate-release-branch' into 'main'
ci: Automate cut-off of release branch See merge request ADLR/megatron-lm!2062
Configuration menu - View commit details
-
Copy full SHA for dccb6df - Browse repository at this point
Copy the full SHA dccb6dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for eb7418f - Browse repository at this point
Copy the full SHA eb7418fView commit details -
Merge branch 'ko3n1g/ci/fix-mirroring' into 'main'
ci: Fixes for mirroring and cherry picking See merge request ADLR/megatron-lm!2064
Configuration menu - View commit details
-
Copy full SHA for 8307fcd - Browse repository at this point
Copy the full SHA 8307fcdView commit details
Commits on Sep 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0b5bc5e - Browse repository at this point
Copy the full SHA 0b5bc5eView commit details -
Merge branch 'ko3n1g/ci/fix-mirroring-2' into 'main'
ci: Use PAT for mirroring See merge request ADLR/megatron-lm!2066
Configuration menu - View commit details
-
Copy full SHA for 27c3737 - Browse repository at this point
Copy the full SHA 27c3737View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6dade5f - Browse repository at this point
Copy the full SHA 6dade5fView commit details -
Merge branch 'ko3n1g/ci/skip-cherrypicking-on-empty-labels' into 'main'
ci: Skip cherry-pick on empty label See merge request ADLR/megatron-lm!2068
Configuration menu - View commit details
-
Copy full SHA for 90cd925 - Browse repository at this point
Copy the full SHA 90cd925View commit details
Commits on Sep 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for bef7771 - Browse repository at this point
Copy the full SHA bef7771View commit details -
Merge branch 'tshiri/format_before_real_change' into 'main'
Fix lint errors in prepartion for other MRs See merge request ADLR/megatron-lm!2051
Configuration menu - View commit details
-
Copy full SHA for d0f5aa9 - Browse repository at this point
Copy the full SHA d0f5aa9View commit details -
Configuration menu - View commit details
-
Copy full SHA for aae7237 - Browse repository at this point
Copy the full SHA aae7237View commit details -
Merge branch 'ko3n1g/ci/repeat-unittests' into 'main'
ci: Repeat unit tests 5 times See merge request ADLR/megatron-lm!2079
Configuration menu - View commit details
-
Copy full SHA for b28d445 - Browse repository at this point
Copy the full SHA b28d445View commit details -
Configuration menu - View commit details
-
Copy full SHA for c290133 - Browse repository at this point
Copy the full SHA c290133View commit details -
Merge branch 'zijiey/skip_upcycling_ut' into 'main'
Skip the upcycling UT. See merge request ADLR/megatron-lm!2081
Configuration menu - View commit details
-
Copy full SHA for 522d8a3 - Browse repository at this point
Copy the full SHA 522d8a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for f03af48 - Browse repository at this point
Copy the full SHA f03af48View commit details -
Merge branch 'xiny/fix_moe_nightly_test' into 'main'
Update Golden Values for MoE Nightly Tests See merge request ADLR/megatron-lm!2067
Configuration menu - View commit details
-
Copy full SHA for bbecd08 - Browse repository at this point
Copy the full SHA bbecd08View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a89bc7 - Browse repository at this point
Copy the full SHA 6a89bc7View commit details -
Merge branch 'ko3n1g/ci/cherry-pick-project' into 'main'
ci: Cherry-pick into the right project See merge request ADLR/megatron-lm!2083
Configuration menu - View commit details
-
Copy full SHA for 3bdae05 - Browse repository at this point
Copy the full SHA 3bdae05View commit details -
Configuration menu - View commit details
-
Copy full SHA for e93d566 - Browse repository at this point
Copy the full SHA e93d566View commit details -
Merge branch 'pstjohn/pyproject.toml' into 'main'
expanding pyproject.toml definitions for uv See merge request ADLR/megatron-lm!2084
Configuration menu - View commit details
-
Copy full SHA for b6887d3 - Browse repository at this point
Copy the full SHA b6887d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1ea3918 - Browse repository at this point
Copy the full SHA 1ea3918View commit details -
Merge branch 'edits-rebased' into 'main'
copyedits try 3 : pure doc changes See merge request ADLR/megatron-lm!1931
Configuration menu - View commit details
-
Copy full SHA for db0fc33 - Browse repository at this point
Copy the full SHA db0fc33View commit details
Commits on Sep 11, 2024
-
ADLR/megatron-lm!2086 - Add Encoder-Decoder Parallelism Documentation
Co-authored-by: Mike Chrzanowski <mchrzanowski@draco-oci-dc-01.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for f218582 - Browse repository at this point
Copy the full SHA f218582View commit details -
Merge branch 'mike/add_encoder_doc' into 'main'
Add Encoder-Decoder Parallelism Documentation See merge request ADLR/megatron-lm!2086
Configuration menu - View commit details
-
Copy full SHA for fe1640a - Browse repository at this point
Copy the full SHA fe1640aView commit details -
ADLR/megatron-lm!1699 - MoE Shared Expert support
Co-authored-by: Zijie Yan <zijiey@nvidia.com> Co-authored-by: tongliu <tongliu@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 1fa9464 - Browse repository at this point
Copy the full SHA 1fa9464View commit details -
Merge branch 'hongxiaob/shared_expert' into 'main'
MoE Shared Expert support Closes #134 See merge request ADLR/megatron-lm!1699
Configuration menu - View commit details
-
Copy full SHA for fec11a7 - Browse repository at this point
Copy the full SHA fec11a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e4e9df - Browse repository at this point
Copy the full SHA 6e4e9dfView commit details -
Merge branch 'zijiey/moe_interface_tests' into 'main'
Add MoE interface tests and move other tests to internal See merge request ADLR/megatron-lm!2088
Configuration menu - View commit details
-
Copy full SHA for 8fc7553 - Browse repository at this point
Copy the full SHA 8fc7553View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2130890 - Browse repository at this point
Copy the full SHA 2130890View commit details -
Merge branch 'ko3n1g/ci/bump-sha-3' into 'main'
ci: Bump reference sha See merge request ADLR/megatron-lm!2092
Configuration menu - View commit details
-
Copy full SHA for 6664dc6 - Browse repository at this point
Copy the full SHA 6664dc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 32949f2 - Browse repository at this point
Copy the full SHA 32949f2View commit details -
Merge branch 'ko3n1g/ci/disable-broken-test' into 'main'
ci: Disable broken test See merge request ADLR/megatron-lm!2093
Configuration menu - View commit details
-
Copy full SHA for df1418a - Browse repository at this point
Copy the full SHA df1418aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f8b7c3f - Browse repository at this point
Copy the full SHA f8b7c3fView commit details -
Merge branch 'trintamaki/multi-image-multi-tile-dataloader-seq-len' i…
…nto 'main' Multimodal sequence length optimizations See merge request ADLR/megatron-lm!1985
Configuration menu - View commit details
-
Copy full SHA for 6151709 - Browse repository at this point
Copy the full SHA 6151709View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3005d02 - Browse repository at this point
Copy the full SHA 3005d02View commit details -
Merge branch 'ko3n1g/tests/flaky-test-2' into 'main'
tests: Disable flaky test See merge request ADLR/megatron-lm!2094
Configuration menu - View commit details
-
Copy full SHA for 9ec2337 - Browse repository at this point
Copy the full SHA 9ec2337View commit details
Commits on Sep 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e5fb1fa - Browse repository at this point
Copy the full SHA e5fb1faView commit details -
Merge branch 'ko3n1g/ci/repeat-mrs' into 'main'
tests: Repeat MRs 5 times See merge request ADLR/megatron-lm!2004
Configuration menu - View commit details
-
Copy full SHA for 028b777 - Browse repository at this point
Copy the full SHA 028b777View commit details -
ADLR/megatron-lm!2091 - Don't pass device_id to torch.distributed.ini…
…t_process_group, it causes hangs Co-authored-by: Szymon Migacz <1934379+szmigacz@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for dcc6634 - Browse repository at this point
Copy the full SHA dcc6634View commit details -
Merge branch 'no_dist_device_id' into 'main'
Don't pass device_id to torch.distributed.init_process_group, it causes hangs See merge request ADLR/megatron-lm!2091
Configuration menu - View commit details
-
Copy full SHA for 76f9f48 - Browse repository at this point
Copy the full SHA 76f9f48View commit details
Commits on Sep 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for bf7b978 - Browse repository at this point
Copy the full SHA bf7b978View commit details -
Merge branch 'ko3n1g/ci/release-tests' into 'main'
ci: Add release tests for 0.9 See merge request ADLR/megatron-lm!2059
Configuration menu - View commit details
-
Copy full SHA for 21924d8 - Browse repository at this point
Copy the full SHA 21924d8View commit details
Commits on Sep 17, 2024
-
ADLR/megatron-lm!2106 - fix: allow merge request CI for non-protected…
… branches to fail
Configuration menu - View commit details
-
Copy full SHA for e6f1d81 - Browse repository at this point
Copy the full SHA e6f1d81View commit details -
Merge branch 'terryk/ci-can-fail-on-unprotected-targets' into 'main'
fix: allow merge request CI for non-protected branches to fail See merge request ADLR/megatron-lm!2106
Configuration menu - View commit details
-
Copy full SHA for 6562666 - Browse repository at this point
Copy the full SHA 6562666View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0902af0 - Browse repository at this point
Copy the full SHA 0902af0View commit details -
Merge branch 'ko3n1g/chore/formatting-on-release-branch' into 'main'
chore: Fix autoformatter for release branches See merge request ADLR/megatron-lm!2107
Configuration menu - View commit details
-
Copy full SHA for 72008a0 - Browse repository at this point
Copy the full SHA 72008a0View commit details -
ADLR/megatron-lm!2104 - Fixing broken links
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Shanmugam Ramasamy and Shanmugam Ramasamy committedSep 17, 2024 Configuration menu - View commit details
-
Copy full SHA for 2a8d8af - Browse repository at this point
Copy the full SHA 2a8d8afView commit details -
Merge branch 'docFix' into 'main'
Fixing broken links See merge request ADLR/megatron-lm!2104
Shanmugam Ramasamy committedSep 17, 2024 Configuration menu - View commit details
-
Copy full SHA for 3f10ff6 - Browse repository at this point
Copy the full SHA 3f10ff6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 71d8ce7 - Browse repository at this point
Copy the full SHA 71d8ce7View commit details -
Merge branch 'add-video-handling' into 'main'
Add video handling into multimodal mcore See merge request ADLR/megatron-lm!2072
Configuration menu - View commit details
-
Copy full SHA for 0bda578 - Browse repository at this point
Copy the full SHA 0bda578View commit details
Commits on Sep 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ab7f706 - Browse repository at this point
Copy the full SHA ab7f706View commit details -
Merge branch 'lora_cg' into 'main'
Enable optional kwargs with CUDA graph See merge request ADLR/megatron-lm!1715
Configuration menu - View commit details
-
Copy full SHA for 77b4bfe - Browse repository at this point
Copy the full SHA 77b4bfeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cffc6b - Browse repository at this point
Copy the full SHA 0cffc6bView commit details -
Merge branch '318-fix-te-version-in-telinear' into 'main'
Resolve "Fix TE version in TELinear" Closes #318 See merge request ADLR/megatron-lm!2077
Configuration menu - View commit details
-
Copy full SHA for 461b06c - Browse repository at this point
Copy the full SHA 461b06cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b78cb1 - Browse repository at this point
Copy the full SHA 6b78cb1View commit details -
Merge branch 'fix_mmmu_mmodal' into 'main'
Update path to MMMU to use new repos structure See merge request ADLR/megatron-lm!2112
Configuration menu - View commit details
-
Copy full SHA for d350231 - Browse repository at this point
Copy the full SHA d350231View commit details -
ADLR/megatron-lm!1880 - Removing env variable NVTE_ALLOW_NONDETERMINI…
…STIC_ALGO Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Shanmugam Ramasamy and Shanmugam Ramasamy committedSep 18, 2024 Configuration menu - View commit details
-
Copy full SHA for cedd415 - Browse repository at this point
Copy the full SHA cedd415View commit details -
Merge branch 'bertflash' into 'main'
Removing env variable NVTE_ALLOW_NONDETERMINISTIC_ALGO See merge request ADLR/megatron-lm!1880
Shanmugam Ramasamy committedSep 18, 2024 Configuration menu - View commit details
-
Copy full SHA for 6b35ca8 - Browse repository at this point
Copy the full SHA 6b35ca8View commit details
Commits on Sep 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 63be779 - Browse repository at this point
Copy the full SHA 63be779View commit details -
Merge branch 'trintamaki/online-eval' into 'main'
Online eval See merge request ADLR/megatron-lm!2033
Configuration menu - View commit details
-
Copy full SHA for 835af44 - Browse repository at this point
Copy the full SHA 835af44View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c9bcac - Browse repository at this point
Copy the full SHA 2c9bcacView commit details -
Merge branch 'trintamaki/multi-image-mmmu' into 'main'
MMMU multi-image support See merge request ADLR/megatron-lm!1973
Configuration menu - View commit details
-
Copy full SHA for 905de33 - Browse repository at this point
Copy the full SHA 905de33View commit details
Commits on Sep 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5c0697c - Browse repository at this point
Copy the full SHA 5c0697cView commit details -
Merge branch 'ko3n1g/build/pip' into 'main'
build: Use multi-stage for parallel builds See merge request ADLR/megatron-lm!2113
Configuration menu - View commit details
-
Copy full SHA for c394f78 - Browse repository at this point
Copy the full SHA c394f78View commit details
Commits on Sep 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for cf596b9 - Browse repository at this point
Copy the full SHA cf596b9View commit details -
Merge branch 'dnarayanan/warning_fix' into 'main'
Only print warning when relevant See merge request ADLR/megatron-lm!2126
Configuration menu - View commit details
-
Copy full SHA for 640e62f - Browse repository at this point
Copy the full SHA 640e62fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3eeb932 - Browse repository at this point
Copy the full SHA 3eeb932View commit details -
Merge branch 'ko3n1g/tests/fix-location-of-megatron' into 'main'
tests: Fix location of megatron See merge request ADLR/megatron-lm!2124
Configuration menu - View commit details
-
Copy full SHA for 205f946 - Browse repository at this point
Copy the full SHA 205f946View commit details -
Configuration menu - View commit details
-
Copy full SHA for d210eb0 - Browse repository at this point
Copy the full SHA d210eb0View commit details -
Merge branch 'ko3n1g/chore/bump-sha' into 'main'
ci: Bump sha See merge request ADLR/megatron-lm!2127
Configuration menu - View commit details
-
Copy full SHA for 811a26a - Browse repository at this point
Copy the full SHA 811a26aView commit details
Commits on Sep 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 405135a - Browse repository at this point
Copy the full SHA 405135aView commit details -
Merge branch 'ko3n1g/ci/improve-cherry-pick-workflow' into 'main'
ci: Improve cherry pick workflow See merge request ADLR/megatron-lm!2128
Configuration menu - View commit details
-
Copy full SHA for fba615f - Browse repository at this point
Copy the full SHA fba615fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 95be3cb - Browse repository at this point
Copy the full SHA 95be3cbView commit details -
Merge branch 'ko3n1g/ci/convergence-tests-with-jet' into 'main'
ci: Introduce JET Python SDK See merge request ADLR/megatron-lm!2034
Configuration menu - View commit details
-
Copy full SHA for e79808c - Browse repository at this point
Copy the full SHA e79808cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e10a9f4 - Browse repository at this point
Copy the full SHA e10a9f4View commit details -
Merge branch 'ko3n1g/ci/improve-cherry-pick-workflow' into 'main'
ci: Improve cherry pick MR description See merge request ADLR/megatron-lm!2130
Configuration menu - View commit details
-
Copy full SHA for 8e69382 - Browse repository at this point
Copy the full SHA 8e69382View commit details
Commits on Sep 23, 2024
-
ADLR/megatron-lm!2119 - Huvu/t5 te10 fix nemoci pr482
Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for e35818d - Browse repository at this point
Copy the full SHA e35818dView commit details -
Merge branch 'huvu/t5_TE10_fix_nemoci_PR482' into 'main'
Huvu/t5 te10 fix nemoci pr482 See merge request ADLR/megatron-lm!2119
Configuration menu - View commit details
-
Copy full SHA for dbd2d18 - Browse repository at this point
Copy the full SHA dbd2d18View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c666c2 - Browse repository at this point
Copy the full SHA 8c666c2View commit details -
Merge branch 'ko3n1g/ci/cherry-pick-authro' into 'main'
ci: Set author and milestone for cherry-picks See merge request ADLR/megatron-lm!2134
Configuration menu - View commit details
-
Copy full SHA for 6d8dc80 - Browse repository at this point
Copy the full SHA 6d8dc80View commit details -
Configuration menu - View commit details
-
Copy full SHA for c45f951 - Browse repository at this point
Copy the full SHA c45f951View commit details -
Merge branch 'ko3n1g/ci/notify-ut' into 'main'
ci: Send alerts on unit-tests-extended See merge request ADLR/megatron-lm!2135
Configuration menu - View commit details
-
Copy full SHA for 08e80b0 - Browse repository at this point
Copy the full SHA 08e80b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 643e60a - Browse repository at this point
Copy the full SHA 643e60aView commit details -
Merge branch 'ko3n1g/ci/fixes-to-jet' into 'main'
tests: Minor improvements to JET See merge request ADLR/megatron-lm!2133
Configuration menu - View commit details
-
Copy full SHA for 8ec4617 - Browse repository at this point
Copy the full SHA 8ec4617View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ade91a - Browse repository at this point
Copy the full SHA 5ade91aView commit details -
Merge branch 'ko3n1g/tests/fix-gpt-release-samples' into 'main'
tests: Fix GPT test See merge request ADLR/megatron-lm!2136
Configuration menu - View commit details
-
Copy full SHA for 1f2d556 - Browse repository at this point
Copy the full SHA 1f2d556View commit details -
Configuration menu - View commit details
-
Copy full SHA for e464e94 - Browse repository at this point
Copy the full SHA e464e94View commit details -
Merge branch 'ko3n1g/ci/cherry-pick-strip-chars' into 'main'
ci: Fix cherry-pick strings See merge request ADLR/megatron-lm!2139
Configuration menu - View commit details
-
Copy full SHA for 0fd4617 - Browse repository at this point
Copy the full SHA 0fd4617View commit details -
Configuration menu - View commit details
-
Copy full SHA for ede39b8 - Browse repository at this point
Copy the full SHA ede39b8View commit details -
Merge branch 'trintamaki/multimodal-eval-dataset' into 'main'
Use torch dataloader in multimodal evaluation See merge request ADLR/megatron-lm!2110
Configuration menu - View commit details
-
Copy full SHA for 2065c35 - Browse repository at this point
Copy the full SHA 2065c35View commit details -
Configuration menu - View commit details
-
Copy full SHA for 697ea61 - Browse repository at this point
Copy the full SHA 697ea61View commit details -
Merge branch 'ko3n1g/ci/dev-container' into 'main'
ci: Enable dev container for new features See merge request ADLR/megatron-lm!2137
Configuration menu - View commit details
-
Copy full SHA for 075c727 - Browse repository at this point
Copy the full SHA 075c727View commit details
Commits on Sep 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5e23e72 - Browse repository at this point
Copy the full SHA 5e23e72View commit details -
Merge branch 'revert_bincount' into 'main'
Fix performance regression brought by torch.bincount Closes #263 See merge request ADLR/megatron-lm!2005
Configuration menu - View commit details
-
Copy full SHA for 884b087 - Browse repository at this point
Copy the full SHA 884b087View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad38459 - Browse repository at this point
Copy the full SHA ad38459View commit details -
Merge branch 'trintamaki/multimodal_batch_bugfix' into 'main'
Multimodal batched bug fix See merge request ADLR/megatron-lm!2073
Configuration menu - View commit details
-
Copy full SHA for 162b82d - Browse repository at this point
Copy the full SHA 162b82dView commit details -
ADLR/megatron-lm!1581 - Add MLA support into MCore
Co-authored-by: Shunkang <shunkangz@nvidia.com> Co-authored-by: BoxiangW <bwang1@fas.harvard.edu>
Configuration menu - View commit details
-
Copy full SHA for 32eac88 - Browse repository at this point
Copy the full SHA 32eac88View commit details -
Merge branch 'boxiangw/mla' into 'main'
Add MLA support into MCore See merge request ADLR/megatron-lm!1581
Configuration menu - View commit details
-
Copy full SHA for dcf9e77 - Browse repository at this point
Copy the full SHA dcf9e77View commit details
Commits on Sep 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d207755 - Browse repository at this point
Copy the full SHA d207755View commit details -
Merge branch 'trintamaki/pretrain_vlm_freeze_option' into 'main'
Add freeze options to pretrain_vlm See merge request ADLR/megatron-lm!1995
Configuration menu - View commit details
-
Copy full SHA for 891b8f9 - Browse repository at this point
Copy the full SHA 891b8f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 31c23f5 - Browse repository at this point
Copy the full SHA 31c23f5View commit details -
Merge branch 'dnarayanan/improve_logging' into 'main'
Improve logging when decreasing batch size See merge request ADLR/megatron-lm!2145
Configuration menu - View commit details
-
Copy full SHA for 78bef1c - Browse repository at this point
Copy the full SHA 78bef1cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5aceacb - Browse repository at this point
Copy the full SHA 5aceacbView commit details -
Merge branch 'hn-set-model-eval-mode' into 'main'
Add model.eval() to run_text_generation_server.py See merge request ADLR/megatron-lm!2148
Configuration menu - View commit details
-
Copy full SHA for 4158084 - Browse repository at this point
Copy the full SHA 4158084View commit details
Commits on Sep 26, 2024
-
ADLR/megatron-lm!2111 - Mcore llama3.1 support
Co-authored-by: Jon Barker <jbarker@draco-oci-dc-01.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for 368f561 - Browse repository at this point
Copy the full SHA 368f561View commit details -
Merge branch 'jbarker/llama3.1' into 'main'
Mcore llama3.1 support See merge request ADLR/megatron-lm!2111
Configuration menu - View commit details
-
Copy full SHA for c1c19d1 - Browse repository at this point
Copy the full SHA c1c19d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1265399 - Browse repository at this point
Copy the full SHA 1265399View commit details -
Merge branch 'ko3n1g/ci/uts-on-dev' into 'main'
ci: Run experimental UTs on dev image See merge request ADLR/megatron-lm!2151
Configuration menu - View commit details
-
Copy full SHA for c025cec - Browse repository at this point
Copy the full SHA c025cecView commit details -
ADLR/megatron-lm!1953 - Mcore export to export models to TRTLLM (GPU …
…and CPU version) Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
3 people committedSep 26, 2024 Configuration menu - View commit details
-
Copy full SHA for f0d7120 - Browse repository at this point
Copy the full SHA f0d7120View commit details -
Merge branch 'final_export' into 'main'
Mcore export to export models to TRTLLM (GPU and CPU version) See merge request ADLR/megatron-lm!1953
Shanmugam Ramasamy committedSep 26, 2024 Configuration menu - View commit details
-
Copy full SHA for 45bf4c1 - Browse repository at this point
Copy the full SHA 45bf4c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for f5171f2 - Browse repository at this point
Copy the full SHA f5171f2View commit details -
Merge branch 'ko3n1g/ci/prune-container-cache-mcore-docker-node-jet' …
…into 'main' ci: Prune docker cache of `mcore-docker-node-jet` See merge request ADLR/megatron-lm!2154
Configuration menu - View commit details
-
Copy full SHA for e38d92a - Browse repository at this point
Copy the full SHA e38d92aView commit details -
ADLR/megatron-lm!2155 - Resolve release test failure caused by Groupe…
…dMLP distributed checkpointing
Configuration menu - View commit details
-
Copy full SHA for c31452c - Browse repository at this point
Copy the full SHA c31452cView commit details -
Merge branch 'xuwenc/release_perf_bugfix' into 'main'
Resolve release test failure caused by GroupedMLP distributed checkpointing See merge request ADLR/megatron-lm!2155
Configuration menu - View commit details
-
Copy full SHA for d55d61a - Browse repository at this point
Copy the full SHA d55d61aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3beefb5 - Browse repository at this point
Copy the full SHA 3beefb5View commit details -
Merge branch 'ko3n1g/tests/better-logging-to-wandb' into 'main'
tests: Set better name for Wandb logging See merge request ADLR/megatron-lm!2156
Configuration menu - View commit details
-
Copy full SHA for 5553fc1 - Browse repository at this point
Copy the full SHA 5553fc1View commit details
Commits on Sep 27, 2024
-
ADLR/megatron-lm!1950 - Remove pkg_resources package
Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 0976661 - Browse repository at this point
Copy the full SHA 0976661View commit details -
Merge branch 'fix_version_checks' into 'main'
Remove pkg_resources package See merge request ADLR/megatron-lm!1950
Configuration menu - View commit details
-
Copy full SHA for 1585be2 - Browse repository at this point
Copy the full SHA 1585be2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2bad957 - Browse repository at this point
Copy the full SHA 2bad957View commit details -
Merge branch 'ko3n1g/ci/onboard-cw' into 'main'
ci: Onboard CW See merge request ADLR/megatron-lm!2142
Configuration menu - View commit details
-
Copy full SHA for 12c2696 - Browse repository at this point
Copy the full SHA 12c2696View commit details
Commits on Sep 28, 2024
-
ADLR/megatron-lm!2158 - Small changes to export
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 3428cd9 - Browse repository at this point
Copy the full SHA 3428cd9View commit details -
Merge branch 'new_export' into 'main'
Small changes to export See merge request ADLR/megatron-lm!2158
Configuration menu - View commit details
-
Copy full SHA for b3375a0 - Browse repository at this point
Copy the full SHA b3375a0View commit details
Commits on Sep 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5b7374a - Browse repository at this point
Copy the full SHA 5b7374aView commit details -
Merge branch 'boxiangw/mla_backwards_comp' into 'main'
Fix rope backward compatibility See merge request ADLR/megatron-lm!2152
Configuration menu - View commit details
-
Copy full SHA for 6ad11b0 - Browse repository at this point
Copy the full SHA 6ad11b0View commit details
Commits on Oct 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ca6d170 - Browse repository at this point
Copy the full SHA ca6d170View commit details -
Merge branch 'auto_cudagraph_val_fix' into 'main'
[Bug fix] Don't trace graphs during inference See merge request ADLR/megatron-lm!2140
Configuration menu - View commit details
-
Copy full SHA for dddecd1 - Browse repository at this point
Copy the full SHA dddecd1View commit details -
ADLR/megatron-lm!2109 - Adding more MR tests for T5 (e.g., transforme…
…r_engine, distributed_checkpoint) Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 5ab659b - Browse repository at this point
Copy the full SHA 5ab659bView commit details -
Merge branch 'huvu/t5_dist_checkpoint_mrtests' into 'main'
Adding more MR tests for T5 (e.g., transformer_engine, distributed_checkpoint) See merge request ADLR/megatron-lm!2109
Configuration menu - View commit details
-
Copy full SHA for 3efa8c2 - Browse repository at this point
Copy the full SHA 3efa8c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f07581b - Browse repository at this point
Copy the full SHA f07581bView commit details -
Merge branch 'ko3n1g/ci/artifacts' into 'main'
ci: Download artifacts See merge request ADLR/megatron-lm!2164
Configuration menu - View commit details
-
Copy full SHA for 85cd99b - Browse repository at this point
Copy the full SHA 85cd99bView commit details
Commits on Oct 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 858694f - Browse repository at this point
Copy the full SHA 858694fView commit details -
Merge branch 'ko3n1g/ci/backwards-tag' into 'main'
ci: Bump version See merge request ADLR/megatron-lm!2165
Configuration menu - View commit details
-
Copy full SHA for 065260b - Browse repository at this point
Copy the full SHA 065260bView commit details
Commits on Oct 3, 2024
-
ADLR/megatron-lm!2153 - Add the interface to set TP communication boo…
…tstrap backend Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for f76b465 - Browse repository at this point
Copy the full SHA f76b465View commit details -
Merge branch 'tp_bootstrap_backend' into 'main'
Add the interface to set TP communication bootstrap backend See merge request ADLR/megatron-lm!2153
Configuration menu - View commit details
-
Copy full SHA for 25f7da2 - Browse repository at this point
Copy the full SHA 25f7da2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50042ff - Browse repository at this point
Copy the full SHA 50042ffView commit details -
Merge branch 'convert_siglip_model' into 'main'
Add support for SigLIP vision encoder to multimodal mcore See merge request ADLR/megatron-lm!2095
Configuration menu - View commit details
-
Copy full SHA for 4d5f94d - Browse repository at this point
Copy the full SHA 4d5f94dView commit details
Commits on Oct 4, 2024
-
ADLR/megatron-lm!2175 - adding cu_seqlens_padded support in MCore
Co-authored-by: root <root@cw-dfw-h100-002-248-012.cm.cluster> Co-authored-by: Lifu Zhang <tomzhanglf@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2aaf85d - Browse repository at this point
Copy the full SHA 2aaf85dView commit details -
Merge branch 'add_cu_seqlens_padded_support' into 'main'
adding cu_seqlens_padded support in MCore See merge request ADLR/megatron-lm!2175
Configuration menu - View commit details
-
Copy full SHA for c02b335 - Browse repository at this point
Copy the full SHA c02b335View commit details -
ADLR/megatron-lm!2181 - Fixing attention mask dimenions to support TE…
… versions > 1.9 Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for ee9dba2 - Browse repository at this point
Copy the full SHA ee9dba2View commit details -
Merge branch 'fixattnmask' into 'main'
Fixing attention mask dimenions to support TE versions > 1.9 See merge request ADLR/megatron-lm!2181
Configuration menu - View commit details
-
Copy full SHA for fde8bb1 - Browse repository at this point
Copy the full SHA fde8bb1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 843a22e - Browse repository at this point
Copy the full SHA 843a22eView commit details -
Merge branch 'yueshen/rotary_scaling_fix_llama3_1' into 'main'
rotary_scaling fix for llama3.1 and 3.2 See merge request ADLR/megatron-lm!2180
Configuration menu - View commit details
-
Copy full SHA for b98ec86 - Browse repository at this point
Copy the full SHA b98ec86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 827d5b6 - Browse repository at this point
Copy the full SHA 827d5b6View commit details -
Merge branch 'ko3n1g/ci/fix-launch-script-generator' into 'main'
chore: Improve generator for launch scripts See merge request ADLR/megatron-lm!2185
Configuration menu - View commit details
-
Copy full SHA for 31fe61a - Browse repository at this point
Copy the full SHA 31fe61aView commit details
Commits on Oct 5, 2024
-
ADLR/megatron-lm!2160 - Adding Inference pipeline for T5
Co-authored-by: Eric Harper <eharper@nvidia.com> Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for e2a1c52 - Browse repository at this point
Copy the full SHA e2a1c52View commit details -
Merge branch 'huvu/t5_generate' into 'main'
Adding Inference pipeline for T5 See merge request ADLR/megatron-lm!2160
Configuration menu - View commit details
-
Copy full SHA for 0acda93 - Browse repository at this point
Copy the full SHA 0acda93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f9ac3c - Browse repository at this point
Copy the full SHA 2f9ac3cView commit details -
Merge branch 'ko3n1g/ci/group-runs' into 'main'
ci: Group runs by model See merge request ADLR/megatron-lm!2182
Configuration menu - View commit details
-
Copy full SHA for edb51fc - Browse repository at this point
Copy the full SHA edb51fcView commit details -
ADLR/megatron-lm!1862 - Cpu init te
Co-authored-by: William Dykas <wdykas@cw-dfw-cs-001-dc-02.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-001-097-026.cm.cluster> Co-authored-by: William Dykas <wdykas@cs-cw-dfw-login-01.cm.cluster>
Configuration menu - View commit details
-
Copy full SHA for cf0d855 - Browse repository at this point
Copy the full SHA cf0d855View commit details -
Merge branch 'cpu-init-te' into 'main'
Cpu init te See merge request ADLR/megatron-lm!1862
Configuration menu - View commit details
-
Copy full SHA for 0e6bef1 - Browse repository at this point
Copy the full SHA 0e6bef1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6939737 - Browse repository at this point
Copy the full SHA 6939737View commit details -
Merge branch 'ko3n1g/ci/run-script-after-export' into 'main'
ci: Run script after export See merge request ADLR/megatron-lm!2186
Configuration menu - View commit details
-
Copy full SHA for 73e7b58 - Browse repository at this point
Copy the full SHA 73e7b58View commit details
Commits on Oct 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6ca379e - Browse repository at this point
Copy the full SHA 6ca379eView commit details -
Merge branch 'runtime-upcycling' into 'main'
Fix upcycling issues. See merge request ADLR/megatron-lm!2089
Configuration menu - View commit details
-
Copy full SHA for ff5cee9 - Browse repository at this point
Copy the full SHA ff5cee9View commit details -
Configuration menu - View commit details
-
Copy full SHA for a559ec1 - Browse repository at this point
Copy the full SHA a559ec1View commit details -
Merge branch 'ko3n1g/ci/fix-env-export' into 'main'
tests: Fix ENV export See merge request ADLR/megatron-lm!2189
Configuration menu - View commit details
-
Copy full SHA for 3f90b98 - Browse repository at this point
Copy the full SHA 3f90b98View commit details
Commits on Oct 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e108535 - Browse repository at this point
Copy the full SHA e108535View commit details -
Merge branch 'ko3n1g/ci/fix-env-export' into 'main'
tests: Fix ENV export See merge request ADLR/megatron-lm!2194
Configuration menu - View commit details
-
Copy full SHA for 3f43927 - Browse repository at this point
Copy the full SHA 3f43927View commit details -
ADLR/megatron-lm!1790 - GroupedMLP DistOpt Resharding and add UTs to …
…ChainedOptimizer Support for distributed checkpointing
Configuration menu - View commit details
-
Copy full SHA for fbdc916 - Browse repository at this point
Copy the full SHA fbdc916View commit details -
Merge branch 'hongxiaob/moe_dist_ckpt' into 'main'
GroupedMLP DistOpt Resharding and add UTs to ChainedOptimizer Support for distributed checkpointing See merge request ADLR/megatron-lm!1790
Configuration menu - View commit details
-
Copy full SHA for b1218b9 - Browse repository at this point
Copy the full SHA b1218b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5776d06 - Browse repository at this point
Copy the full SHA 5776d06View commit details -
Merge branch 'ko3n1g/ci/always-artifacts' into 'main'
ci: Always upload artifacts See merge request ADLR/megatron-lm!2197
Configuration menu - View commit details
-
Copy full SHA for bf74129 - Browse repository at this point
Copy the full SHA bf74129View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e3eaa5 - Browse repository at this point
Copy the full SHA 0e3eaa5View commit details -
Merge branch 'trintamaki/data-parallel-inference' into 'main'
Data parallel inference See merge request ADLR/megatron-lm!2141
Configuration menu - View commit details
-
Copy full SHA for fcdbf90 - Browse repository at this point
Copy the full SHA fcdbf90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 37a2116 - Browse repository at this point
Copy the full SHA 37a2116View commit details -
Merge branch 'vitalyk/testfix' into 'main'
Remove CUDA requirement from cpu test. See merge request ADLR/megatron-lm!2199
Configuration menu - View commit details
-
Copy full SHA for 228dc20 - Browse repository at this point
Copy the full SHA 228dc20View commit details
Commits on Oct 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for f462160 - Browse repository at this point
Copy the full SHA f462160View commit details -
Merge branch 'packed_seq_padded_support' into 'main'
Support padding between subsequences of Packed Sequence See merge request ADLR/megatron-lm!2096
Configuration menu - View commit details
-
Copy full SHA for 7e90ec0 - Browse repository at this point
Copy the full SHA 7e90ec0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 566d9cd - Browse repository at this point
Copy the full SHA 566d9cdView commit details -
Merge branch 'revert-228dc204' into 'main'
Revert "Merge branch 'vitalyk/testfix' into 'main'" See merge request ADLR/megatron-lm!2206
Configuration menu - View commit details
-
Copy full SHA for b60f5d0 - Browse repository at this point
Copy the full SHA b60f5d0View commit details
Commits on Oct 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 13c39ac - Browse repository at this point
Copy the full SHA 13c39acView commit details -
Merge branch 'sasatheesh/tokenizer_offsets' into 'main'
Standard interface for getting offsets from tokenizers See merge request ADLR/megatron-lm!1909
Configuration menu - View commit details
-
Copy full SHA for 47bb8d1 - Browse repository at this point
Copy the full SHA 47bb8d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c018ca - Browse repository at this point
Copy the full SHA 8c018caView commit details -
Merge branch 'ko3n1g/ci/flaky-marker' into 'main'
tests: Use flaky instead of skip marker See merge request ADLR/megatron-lm!2208
Configuration menu - View commit details
-
Copy full SHA for 772faca - Browse repository at this point
Copy the full SHA 772facaView commit details
Commits on Oct 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 831d64d - Browse repository at this point
Copy the full SHA 831d64dView commit details -
Merge branch 'ko3n1g/chore/bump-pyt' into 'main'
chore: Bump Pytorch container See merge request ADLR/megatron-lm!2017
Configuration menu - View commit details
-
Copy full SHA for 4876ee1 - Browse repository at this point
Copy the full SHA 4876ee1View commit details -
Configuration menu - View commit details
-
Copy full SHA for bc4874c - Browse repository at this point
Copy the full SHA bc4874cView commit details -
Merge branch 'add_siglip_converter' into 'main'
Add siglip converter to multimodal example See merge request ADLR/megatron-lm!2214
Configuration menu - View commit details
-
Copy full SHA for 6bafe92 - Browse repository at this point
Copy the full SHA 6bafe92View commit details -
Configuration menu - View commit details
-
Copy full SHA for a30d63b - Browse repository at this point
Copy the full SHA a30d63bView commit details -
Merge branch 'dnarayanan/fix_import' into 'main'
Add missing import to megatron/training/initialize.py See merge request ADLR/megatron-lm!2226
Configuration menu - View commit details
-
Copy full SHA for 0d89fc4 - Browse repository at this point
Copy the full SHA 0d89fc4View commit details
Commits on Oct 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 33d2f45 - Browse repository at this point
Copy the full SHA 33d2f45View commit details -
Merge branch 'ko3n1g/ci/refactor-jobs' into 'main'
ci(refactor): Facelift gitlab-ci See merge request ADLR/megatron-lm!2223
Configuration menu - View commit details
-
Copy full SHA for 55622ff - Browse repository at this point
Copy the full SHA 55622ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for cba8bdc - Browse repository at this point
Copy the full SHA cba8bdcView commit details -
Merge branch 'ko3n1g/ci/test-dependencies' into 'main'
ci: Set stronger dependencies See merge request ADLR/megatron-lm!2234
Configuration menu - View commit details
-
Copy full SHA for ecf0dbe - Browse repository at this point
Copy the full SHA ecf0dbeView commit details
Commits on Oct 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 839dff2 - Browse repository at this point
Copy the full SHA 839dff2View commit details -
Merge branch 'duncan/triton-cache-fix' into 'main'
Triton cache fix See merge request ADLR/megatron-lm!2075
Configuration menu - View commit details
-
Copy full SHA for b7814bb - Browse repository at this point
Copy the full SHA b7814bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a9c16c5 - Browse repository at this point
Copy the full SHA a9c16c5View commit details -
Merge branch 'lit/fix_multi_tensor_scale' into 'main'
fix an issue when using `multi_tensor_scale` from TE See merge request ADLR/megatron-lm!1939
Configuration menu - View commit details
-
Copy full SHA for 02d1762 - Browse repository at this point
Copy the full SHA 02d1762View commit details -
ADLR/megatron-lm!1927 - Improved missing key exception for errors dur…
…ing checkpoint io
Configuration menu - View commit details
-
Copy full SHA for 6adf0bd - Browse repository at this point
Copy the full SHA 6adf0bdView commit details -
Merge branch 'jstjohn/improved_missing_key_exception' into 'main'
Improved missing key exception for errors during checkpoint io See merge request ADLR/megatron-lm!1927
Configuration menu - View commit details
-
Copy full SHA for db6cb4e - Browse repository at this point
Copy the full SHA db6cb4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c950a5 - Browse repository at this point
Copy the full SHA 2c950a5View commit details -
Merge branch 'pmannan/llava_debug' into 'main'
LLaVA Multimodal SP support See merge request ADLR/megatron-lm!2038
Configuration menu - View commit details
-
Copy full SHA for 739177e - Browse repository at this point
Copy the full SHA 739177eView commit details -
Configuration menu - View commit details
-
Copy full SHA for d28e26e - Browse repository at this point
Copy the full SHA d28e26eView commit details -
Merge branch 'qwen25_conversion' into 'main'
qwen2.5 conversion See merge request ADLR/megatron-lm!2227
Configuration menu - View commit details
-
Copy full SHA for db7d37b - Browse repository at this point
Copy the full SHA db7d37bView commit details