It seems that the importance sampling code part is wrong. #22

yhy258 · 2023-05-07T12:36:25Z

Lines 108 to 119 in e200eb8

    
           fixed_log_prob = normal_log_density(Variable(actions), action_means, action_log_stds, action_stds).data.clone() 
        
           def get_loss(volatile=False): 
        
               if volatile: 
        
                   with torch.no_grad(): 
        
                       action_means, action_log_stds, action_stds = policy_net(Variable(states)) 
        
               else: 
        
                   action_means, action_log_stds, action_stds = policy_net(Variable(states)) 
        
               log_prob = normal_log_density(Variable(actions), action_means, action_log_stds, action_stds) 
        
               action_loss = -Variable(advantages) * torch.exp(log_prob - Variable(fixed_log_prob)) 
        
               return action_loss.mean()

The fixed log prob part of the line and the "get_loss" function part are exactly the same.
The two parts are executed consecutively so that the two values ("fixed_log_prob", "log_prob") are exactly the same.
Is there a reason you wrote the code like this?

asyua-ye · 2024-01-14T06:37:21Z

get_kl，also has this problem

HaoxiangYou · 2024-10-02T17:15:20Z

Hi, I believe the code should be correct. If you check the "fixed_log_prob", it is a constant tensor (no dependent on nn parameters); and if you check the "log_prob" you will see "grad_fn = ..." (dependent on the nn parameters). This is exactly what we want for importance sampling(treat the current parameters as fixed old policy), same logic apply for get_kl().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems that the importance sampling code part is wrong. #22

It seems that the importance sampling code part is wrong. #22

yhy258 commented May 7, 2023 •

edited

Loading

asyua-ye commented Jan 14, 2024

HaoxiangYou commented Oct 2, 2024

It seems that the importance sampling code part is wrong. #22

It seems that the importance sampling code part is wrong. #22

Comments

yhy258 commented May 7, 2023 • edited Loading

asyua-ye commented Jan 14, 2024

HaoxiangYou commented Oct 2, 2024

yhy258 commented May 7, 2023 •

edited

Loading