Skip to content
This repository has been archived by the owner on Sep 1, 2024. It is now read-only.

[Feature Request] Training data selection: Create more "interesting" Replay Buffer Iterators #66

Open
RaghuSpaceRajan opened this issue Apr 7, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@RaghuSpaceRajan
Copy link

🚀 Feature Request

Create Replay Buffer Iterators that can select training and validation data in various "interesting" ways, similar to TransitionIterator and BootStrapIterator in
https://github.com/facebookresearch/mbrl-lib/blob/b0aabd79941efe8b56bcabbd1b43bf497b9b1746/mbrl/replay_buffer.py

Examples:

  1. Select transitions from highly-rewarding trajectories - this could be used to perform analyses of how data selection impacts MBRL, objective mismatch, etc.
  2. Select transitions randomly from the replay buffer to have a fixed size of training/validation data.

Motivation

This would make analysis similar to https://arxiv.org/abs/2002.04523 and https://arxiv.org/abs/2102.13651 easy to perform.

Pitch

It should be fairly easy to implement similar to TransitionIterator and BootStrapIterator above. (Taking care of trajectory/episodic boundaries could be a bit tricky.)

@RaghuSpaceRajan RaghuSpaceRajan added the enhancement New feature or request label Apr 7, 2021
@luisenp
Copy link
Contributor

luisenp commented Apr 7, 2021

Thanks @RaghuSpaceRajan . cc'ing @natolambert since this is highly relevant to his work. I think this proposal is the most straightforward way to do this on the data management side.

@natolambert
Copy link
Contributor

Yes, I have a version of this in my private repo, I will create a PR soon for it. The way I did it was for associating a "weight" for each transition, but some of the core functionality was a function to "update weights" for each trajectories. When updating the weights, it would be easy to create a ranking or heuristic mapping of some sort.

@natolambert
Copy link
Contributor

Related comment, I think it may be worthwhile to have an optional "rich logging" mode, where things like candidate actions, action sequences (plans) at each step, trajectories, and more are saved for every trial in the learning process. It accumulates a lot, but having access to this is useful for debugging.

@luisenp
Copy link
Contributor

luisenp commented Apr 7, 2021

Feel free to open a feature request issue for this as well @natolambert

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants