Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DP] change device mesh dim naming convention to make it more consistent #720

Merged
merged 5 commits into from
Dec 11, 2024

Conversation

XilunWu
Copy link
Contributor

@XilunWu XilunWu commented Dec 5, 2024

Stack from ghstack (oldest at bottom):

Summary
This PR improves the design of DeviceMesh hierarchy in torchtitan. Now, we define all device meshes except world_mesh into 2 categories:

  1. Basic mesh: those meshes defined in job .toml file by users. This include pp (pipeline_parallel_degree), dp_replicate (data_parallel_replicate_degree), dp_shard (data_parallel_shard_degree), tp (tensor_parallel_degree), and cp(context_parallel_degree).
  2. Synthesized mesh (or called "derived mesh"): meshes that are synthesized from basic mesh by _flatten(). If the mesh in synthesized from a single mesh, then it is equivalent to aliasing. So far we utilize 2 synthesized meshes: dp and dp_shard_cp. The dp mesh is used for data loading and the dp_shard_cp mesh is used for model params sharding.

Test
CI

…ore consistent"


**Summary**
This PR improves the design of DeviceMesh hierarchy in torchtitan. Now, we define all device meshes except `world_mesh` into 2 categories:
1. Basic mesh: those meshes defined in job `.toml` file by users. This include `pp` (`pipeline_parallel_degree`), `dp_replicate` (`data_parallel_replicate_degree`), `dp_shard` (`data_parallel_shard_degree`), `tp` (`tensor_parallel_degree`), and `cp`(`context_parallel_degree`).
2. Synthesized mesh (or called "derived mesh"): meshes that are synthesized from basic mesh by `_flatten()`. If the mesh in synthesized from a single mesh, then it is equivalent to aliasing. So far we utilize 2 synthesized meshes: `dp` and `dp_shard_cp`. The `dp` mesh is used for data loading and the `dp_shard_cp` mesh is used for model params sharding.

**Test**
CI


[ghstack-poisoned]
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

…ion to make it more consistent"


**Summary**
This PR improves the design of DeviceMesh hierarchy in torchtitan. Now, we define all device meshes except `world_mesh` into 2 categories:
1. Basic mesh: those meshes defined in job `.toml` file by users. This include `pp` (`pipeline_parallel_degree`), `dp_replicate` (`data_parallel_replicate_degree`), `dp_shard` (`data_parallel_shard_degree`), `tp` (`tensor_parallel_degree`), and `cp`(`context_parallel_degree`).
2. Synthesized mesh (or called "derived mesh"): meshes that are synthesized from basic mesh by `_flatten()`. If the mesh in synthesized from a single mesh, then it is equivalent to aliasing. So far we utilize 2 synthesized meshes: `dp` and `dp_shard_cp`. The `dp` mesh is used for data loading and the `dp_shard_cp` mesh is used for model params sharding.

**Test**
CI


[ghstack-poisoned]
…ore consistent"


**Summary**
This PR improves the design of DeviceMesh hierarchy in torchtitan. Now, we define all device meshes except `world_mesh` into 2 categories:
1. Basic mesh: those meshes defined in job `.toml` file by users. This include `pp` (`pipeline_parallel_degree`), `dp_replicate` (`data_parallel_replicate_degree`), `dp_shard` (`data_parallel_shard_degree`), `tp` (`tensor_parallel_degree`), and `cp`(`context_parallel_degree`).
2. Synthesized mesh (or called "derived mesh"): meshes that are synthesized from basic mesh by `_flatten()`. If the mesh in synthesized from a single mesh, then it is equivalent to aliasing. So far we utilize 2 synthesized meshes: `dp` and `dp_shard_cp`. The `dp` mesh is used for data loading and the `dp_shard_cp` mesh is used for model params sharding.

**Test**
CI


[ghstack-poisoned]
@XilunWu XilunWu changed the base branch from gh/XilunWu/13/base to main December 11, 2024 22:42
@XilunWu XilunWu merged commit cb633e3 into main Dec 11, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants