Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From NVIDIA Megatron-LM for visibility #18

Open
wants to merge 3,242 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
mikolajblaz and others added 28 commits September 5, 2024 10:17
Optimize broadcasted data during parallel load

See merge request ADLR/megatron-lm!1968
Fix description of distributed optimizer workflow

See merge request ADLR/megatron-lm!1951
Add native-fp8

See merge request ADLR/megatron-lm!1669
Restore the actual PyT 2.4 fix from !1970

See merge request ADLR/megatron-lm!2039
tests: Skip flaky mamba test

See merge request ADLR/megatron-lm!2044
ci: Bump reference sha

See merge request ADLR/megatron-lm!2048
Add model config files for Mixtral-8x7B and Mixtral-8x22B performance benchmarking

See merge request ADLR/megatron-lm!2029
Co-authored-by: William Dykas <wdykas@cw-dfw-cs-001-dc-02.cm.cluster>
Co-authored-by: William Dykas <wdykas@cw-dfw-cs-001-dc-01.cm.cluster>
Co-authored-by: William Dykas <wdykas@cs-cw-dfw-login-01.cm.cluster>
Co-authored-by: William Dykas <wdykas@cs-cw-dfw-dc-02.cm.cluster>
Uneven Pipeline Parallelism

See merge request ADLR/megatron-lm!1881
Co-authored-by: Jon Barker <jbarker@draco-oci-dc-01.cm.cluster>
Add support for pytorch tensorboard profiler

See merge request ADLR/megatron-lm!1912
ci: Pass `LOAD_PATH` into training

See merge request ADLR/megatron-lm!2050
…to return true if hashes across all DP ranks match.
… into 'main'

Update check_param_hashes_across_dp_replicas to return true if hashes across all DP ranks match.

See merge request ADLR/megatron-lm!1958
Per layer cudagraph support for GPT training with Transformer Engine modules

See merge request ADLR/megatron-lm!1796
Update model config files for Mixtral-8x7B and Mixtral-8x22B performance benchmarking

See merge request ADLR/megatron-lm!2053
Revert "ADLR/megatron-lm!1747 - Use TP-CP group for fp8 amax reduction"

See merge request ADLR/megatron-lm!1971
Vitaly Kurin and others added 30 commits October 9, 2024 15:48
Remove CUDA requirement from cpu test.

See merge request ADLR/megatron-lm!2199
Support padding between subsequences of Packed Sequence

See merge request ADLR/megatron-lm!2096
Revert "Merge branch 'vitalyk/testfix' into 'main'"

See merge request ADLR/megatron-lm!2206
Standard interface for getting offsets from tokenizers

See merge request ADLR/megatron-lm!1909
tests: Use flaky instead of skip marker

See merge request ADLR/megatron-lm!2208
chore: Bump Pytorch container

See merge request ADLR/megatron-lm!2017
Add siglip converter to multimodal example

See merge request ADLR/megatron-lm!2214
Add missing import to megatron/training/initialize.py

See merge request ADLR/megatron-lm!2226
ci(refactor): Facelift gitlab-ci

See merge request ADLR/megatron-lm!2223
ci: Set stronger dependencies

See merge request ADLR/megatron-lm!2234
Triton cache fix

See merge request ADLR/megatron-lm!2075
fix an issue when using `multi_tensor_scale` from TE

See merge request ADLR/megatron-lm!1939
Improved missing key exception for errors during checkpoint io

See merge request ADLR/megatron-lm!1927
LLaVA Multimodal SP support

See merge request ADLR/megatron-lm!2038
qwen2.5 conversion

See merge request ADLR/megatron-lm!2227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.