You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the "ReplayBuffer" object uses "jnp.roll" to make room for new data. This works fine if the data collected during each iteration is small enough. However, if we think about the tasks where the episode length is 1000, we will see that each data will only be kept for very few iterations before replaced by more recent ones.
For example, if we do 256 evaluations with episode length equal to 1000, a replay-buffer of size 1million can only hold 4 latest iterations of data. Considering the data are episodic, iterative, and hence highly correlated, keeping latest 4 iterations is not enough to satisfy the iid assumption of Neural-Network training even with the random sampling.
Hence, I recommend adding a new method to the "ReplayBuffer" object called "random_insert(self, key: RNGkey, transitions: Transition)" that randomly select a subset of existing data to be replaced by new data. Thus, we can ensure: first, the data from the latest evaluation always appear in the replay-buffer; second, historical data are replaced in an exponential manner (discount factor = 1 - amount_of_data_collected_each_iteration / replay_buffer_size); and most importantly without the need to enlarge replay-buffer size (hence save us the VRAM).
The text was updated successfully, but these errors were encountered:
Currently the "ReplayBuffer" object uses "jnp.roll" to make room for new data. This works fine if the data collected during each iteration is small enough. However, if we think about the tasks where the episode length is 1000, we will see that each data will only be kept for very few iterations before replaced by more recent ones.
For example, if we do 256 evaluations with episode length equal to 1000, a replay-buffer of size 1million can only hold 4 latest iterations of data. Considering the data are episodic, iterative, and hence highly correlated, keeping latest 4 iterations is not enough to satisfy the iid assumption of Neural-Network training even with the random sampling.
Hence, I recommend adding a new method to the "ReplayBuffer" object called "random_insert(self, key: RNGkey, transitions: Transition)" that randomly select a subset of existing data to be replaced by new data. Thus, we can ensure: first, the data from the latest evaluation always appear in the replay-buffer; second, historical data are replaced in an exponential manner (discount factor = 1 - amount_of_data_collected_each_iteration / replay_buffer_size); and most importantly without the need to enlarge replay-buffer size (hence save us the VRAM).
The text was updated successfully, but these errors were encountered: