Model or optimizer wrapping? #3

andreped · 2022-06-01T20:17:05Z

andreped
Jun 1, 2022
Maintainer

We currently have a working solution for gradient accumulation which is based on overloading the train_step method of a tf.keras.Model. This is quite straight forward, and can be easily implemented. However, we have implemented a convenience class which can be used for this exact task called GAModelWrapper.

However, overloading the train_step might not be optimal. There are situations were people are creating advanced models where they need to overload the train_step themselves. Hence, using our GAModelWrapper would be a bad idea, as it would remove their own edits. Then they should rather incorporate what we did in GAModelWrapper themselves, but that somewhat defeats the purpose...

On the other hand, the optimizer wrapper approach does not seem to work currently. And therefore, the model wrapper approach seem to be the only viable approach in TF 2 for gradient accumulation.

Any thoughts? What could we do to make a solution that is good enough to be integrated into Keras?

Answered by andreped

Jan 29, 2023

In the latest release v0.3.0 now we support both approaches:
https://github.com/andreped/GradientAccumulator/releases/tag/v0.3.0

The main reason why optimizer wrapping is a better solution with the current state of TF2, is that multi-GPU distribute strategy is incompatible with our train_step approach. However, it should work with the optimizer wrapper approach. To be added in the future.

View full answer

andreped · 2023-01-29T17:28:42Z

andreped
Jan 29, 2023
Maintainer Author

In the latest release v0.3.0 now we support both approaches:
https://github.com/andreped/GradientAccumulator/releases/tag/v0.3.0

The main reason why optimizer wrapping is a better solution with the current state of TF2, is that multi-GPU distribute strategy is incompatible with our train_step approach. However, it should work with the optimizer wrapper approach. To be added in the future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model or optimizer wrapping? #3

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Model or optimizer wrapping? #3

andreped Jun 1, 2022 Maintainer

Replies: 1 comment

andreped Jan 29, 2023 Maintainer Author

andreped
Jun 1, 2022
Maintainer

andreped
Jan 29, 2023
Maintainer Author