The training step of CQL-SAC. #1

DooyoungH · 2021-09-29T13:33:22Z

I am studying by referring to your CQL code.
But, I think Line 68 not be proper to Offline RL when I run the train.py of CQL-SAC.
Line 68 : buffer.add(state, action, reward, next_state, done)

Isn't this line an off-policy model by putting data that interacts with the agent and the environment into a buffer?

I thank you for your hard works.

BY571 · 2021-09-29T14:19:46Z

Yes, indeed this is only for the RL setting for an SL setting or BatchRL setting you might have to adapt that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training step of CQL-SAC. #1

The training step of CQL-SAC. #1

DooyoungH commented Sep 29, 2021

BY571 commented Sep 29, 2021

The training step of CQL-SAC. #1

The training step of CQL-SAC. #1

Comments

DooyoungH commented Sep 29, 2021

BY571 commented Sep 29, 2021