Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

CAI23sbP · 2024-04-24T10:42:01Z

Problem

A Preference based Reinforcement learning at a POMDP problem.
In paper, A author said that a reward model can apply a recurrent neural network for solving the POMDP problem.

Solution

I added a GRU for solving the POMDP problem. Please see my repo
My main idea :

BufferingWrapper and RewardVecEnvWrapper must be merged for saving hidden_state with observation, action and etc...
To apply a Recurrent reward network ensembling, I generated hidden_states whose number are same to ensemble_size.

result

I applied this in BipedalWalker-v3 env with AbsorbAfterDoneWrapper from your sister project seals

Addition

I added dict_preference.py for using dict type observation space.

The text was updated successfully, but these errors were encountered:

CAI23sbP added the enhancement New feature or request label Apr 24, 2024

CAI23sbP changed the title ~~Preference based Reinforcement Learning applies "recurrent reward network" for solving a POMDP problem~~ Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

CAI23sbP commented Apr 24, 2024 •

edited

Loading

Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem #848

Comments

CAI23sbP commented Apr 24, 2024 • edited Loading

Problem

Solution

result

Addition

CAI23sbP commented Apr 24, 2024 •

edited

Loading