You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Preference based Reinforcement learning at a POMDP problem. In paper, A author said that a reward model can apply a recurrent neural network for solving the POMDP problem.
Solution
I added a GRU for solving the POMDP problem. Please see my repo
My main idea :
BufferingWrapper and RewardVecEnvWrapper must be merged for saving hidden_state with observation, action and etc...
To apply a Recurrent reward network ensembling, I generated hidden_states whose number are same to ensemble_size.
result
I applied this in BipedalWalker-v3 env with AbsorbAfterDoneWrapper from your sister project seals
Addition
I added dict_preference.py for using dict type observation space.
The text was updated successfully, but these errors were encountered:
CAI23sbP
changed the title
Preference based Reinforcement Learning applies "recurrent reward network" for solving a POMDP problem
Preference based Reinforcement Learning applies a "recurrent reward network" for solving a POMDP problem
Apr 24, 2024
Problem
A Preference based Reinforcement learning at a POMDP problem.
In paper, A author said that a reward model can apply a recurrent neural network for solving the POMDP problem.
Solution
I added a GRU for solving the POMDP problem. Please see my repo
My main idea :
BufferingWrapper
andRewardVecEnvWrapper
must be merged for savinghidden_state
withobservation
,action
and etc...Recurrent reward network ensembling
, I generatedhidden_states
whose number are same to ensemble_size.result
I applied this in
BipedalWalker-v3
env withAbsorbAfterDoneWrapper
from your sister project sealsAddition
I added
dict_preference.py
for using dict type observation space.The text was updated successfully, but these errors were encountered: