What is the full batch size if mesh_dim is set to 1,1,-1, on TPUv3-8? #94

TPFRL · 2023-09-23T00:20:20Z

Hi, thanks for this amazing repo.
I was wondering how should I set batch size to make a desirable full batch size.

For example, if I set train_dataset.huggingface_dataset.batch_size to 1 on TPUv3-8,
what is the full batch size given mesh_dim 1,1,-1 / 1,-1,1 / -1,1,1 ?
Are all of them 8? or 1?

Thanks!

young-geng · 2023-09-24T08:42:45Z

Different mesh dims correspond to different sharding strategies. While they do not define a batch size, they do incur certain constraints on the possible batch size.

1,1,-1 corresponds to tensor parallelism only, and you can use any batch size you want
1,-1,1 corresponds to full FSDP, this means that your batch size needs to a multiple of number of devices (8 here)
-1,1,1 corresponds to full DP, this also means that your batch size needs to a multiple of number of devices (8 here)

jcole75 · 2023-10-04T21:25:28Z

Hi, thanks for this amazing repo. I was wondering how should I set batch size to make a desirable full batch size.

For example, if I set train_dataset.huggingface_dataset.batch_size to 1 on TPUv3-8, what is the full batch size given mesh_dim 1,1,-1 / 1,-1,1 / -1,1,1 ? Are all of them 8? or 1?

Thanks!

Did you get this to run with a v3? I seem to always get HLO out of memory errors.

young-geng · 2023-10-05T07:00:02Z

A single v3-8 only has 128GB of memory in total, which might not be sufficient for training a 7B model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the full batch size if mesh_dim is set to 1,1,-1, on TPUv3-8? #94

What is the full batch size if mesh_dim is set to 1,1,-1, on TPUv3-8? #94

TPFRL commented Sep 23, 2023

young-geng commented Sep 24, 2023

jcole75 commented Oct 4, 2023

young-geng commented Oct 5, 2023

What is the full batch size if mesh_dim is set to 1,1,-1, on TPUv3-8? #94

What is the full batch size if mesh_dim is set to 1,1,-1, on TPUv3-8? #94

Comments

TPFRL commented Sep 23, 2023

young-geng commented Sep 24, 2023

jcole75 commented Oct 4, 2023

young-geng commented Oct 5, 2023