Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Demos on How to config to offload tensors to nvme device #6752

Open
niebowen666 opened this issue Nov 15, 2024 · 2 comments
Open

Some Demos on How to config to offload tensors to nvme device #6752

niebowen666 opened this issue Nov 15, 2024 · 2 comments

Comments

@niebowen666
Copy link

Dear authors:
I am wondering if I could get some demos on config which is used to train a Language Model with ZeRO-Infinity?
It confused me a lot that how to config the "offload_param" and "offload_optimizer"
Thanks!

@niebowen666
Copy link
Author

Thanks for your reply!
But I suffered some new issues:
I configured a new configure during the training of llama as below:

ds_config = {
            "train_batch_size":32,
            "optimizer": {
                "type": "Adam",
                "params": {
                    "lr": 0.00006,
                    "betas": [0.9, 0.95],
                    "weight_decay": 0.01
                }
            },
            "zero_optimization": {
                "stage": 3,
                "contiguous_gradients": True,
                "stage3_max_live_parameters": 1e9,
                "stage3_max_reuse_distance": 1e9,
                "stage3_prefetch_bucket_size": 1e7,
                "stage3_param_persistence_threshold": 1e5,
                "reduce_bucket_size": 1e7,
                "sub_group_size": 1e9,
                "offload_optimizer": {
                    "device": "cpu"
                },
                "offload_param": {
                    "device": "cpu"
                }
            }
        }

It get the error"out of memory " when I set train_batch_size to 64.
I also read your source code, and there exits some confusions for me:
In runtime/engine.py, line 314 to line 322, It seems that if I configured the"optimizer" as adam, it wouldnot run the _configure_zero_optimizer so that the tensor generated by model will not be offload to CPU.
Is my idea right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants