Refactoring ideas for `log_rewards` #200

younik · 2024-10-15T10:37:34Z

Computing log_rewards requires access to the environment. However, Transitions, Trajectories and States provide log_rewards, making a complicated dependency among these class.

I propose two solutions:

We drop log_rewards in Transitions, Trajectories and States. I suppose log_rewards is only needed in GFlowNet classes, we can compute it directly there. The only exception is PrioritizedReplayBuffer, which we can add a scoring function attribute or a score for each added object.
This solution has the drawback of removing caching mechanism (is log_rewards computed multiple time not the same object? Is it a heavy computation?)
We provide log_rewards at the initialization of Transitions, Trajectories and States without accepting None. This is problematic for States as env.log_reward. work on states, making it a chicken-and-egg problem.

The text was updated successfully, but these errors were encountered:

hyeok9855 · 2024-10-18T09:37:10Z

IMHO, solution 2 seems more reasonable to me. I think the possible issue can be resolved by removing log_reward from State with further modifications (I've quickly checked, and it seems not very tricky, e.g., create another subclass of Container that includes the State and log_reward).

josephdviviano · 2024-10-18T13:54:25Z

The reason we have `log_rewards` in these containers is to prevent re-computing it. The log reward can be very expensive to recompute - or very cheap. It completely depends on the environment. I'm open to refactoring this somehow (not sure how atm) but -- we must cache these results somewhere, it's very inefficient to not, and I think where it is currently is a decent spot. Joseph Viviano @josephdviviano <https://twitter.com/josephdviviano> viviano.ca

…

On Fri, Oct 18, 2024 at 5:37 AM Sanghyeok Choi ***@***.***> wrote: IMHO, solution 2 seems more reasonable to me. I think the possible issue can be resolved by removing log_reward from State with further modifications (I've quickly checked, and it seems not very tricky). — Reply to this email directly, view it on GitHub <#200 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7TL2U535VPMUJJYAMPNXTZ4DJF3AVCNFSM6AAAAABP62TH2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRRHE3TQNJXGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

younik · 2024-11-01T22:04:21Z

I investigate it further, and it seems log_rewards are never computed inside the Tranistionsand Trajectories function, because log_rewards is never None.
In fact, at initialization, we do this (it was introduced here):

torchgfn/src/gfn/containers/trajectories.py

Lines 94 to 98 in 9c9e1af

    
           self._log_rewards = ( 
        
               log_rewards 
        
               if log_rewards is not None 
        
               else torch.full(size=(0,), fill_value=0, dtype=torch.float) 
        
           )

which ensure log_rewards is always not None.

So, computation here is never triggered:

torchgfn/src/gfn/containers/trajectories.py

Lines 155 to 165 in 9c9e1af

    
           def log_rewards(self) -> torch.Tensor | None: 
        
               """Returns the log rewards of the trajectories as a tensor of shape (n_trajectories,).""" 
        
               if self._log_rewards is not None: 
        
                   assert self._log_rewards.shape == (self.n_trajectories,) 
        
                   return self._log_rewards 
        
               if self.is_backward: 
        
                   return None 
        
               try: 
        
                   return self.env.log_reward(self.last_states) 
        
               except NotImplementedError: 
        
                   return torch.log(self.env.reward(self.last_states))

I checked that this is the case in this commit (tests run correctly): https://github.com/younik/torchgfn/tree/test-log-rewards-comp

This allows to easily do the solution 2, and straightly remove the env dependency. It also allows for a bunch of code cleaning (in some places we check if log_rewards is None).
@josephdviviano

josephdviviano · 2024-11-05T15:37:35Z

Hi @younik I think the easiest fix is to replace line 157 with the appropriate check (checks whether _log_rewards is empty). But I also think we need to ensure that line 163 either never needs to be called (i.e., is updated externally only) OR has a path to being called (i.e., the Transitions object carries a state which determines that the log rewards need to be updated).

I'm curious what you think?

younik · 2024-11-05T16:47:50Z

Hi @younik I think the easiest fix is to replace line 157 with the appropriate check (checks whether _log_rewards is empty). But I also think we need to ensure that line 163 either never needs to be called (i.e., is updated externally only) OR has a path to being called (i.e., the Transitions object carries a state which determines that the log rewards need to be updated).

I'm curious what you think?

I believe the semantics of empty should be" the trajectory is empty", and it shouldn't happen that we have n states with an empty log reward tensor. To indicate something must be computed, it is better to use None.

However, it looks like we don't need log_rewards computation inside the Trajectories(and maybe inside Transitions).
For maintainability, it is better to prune everything that is not used because it affects code readability.
Of course, we must ensure the user is using it properly, but I believe this lines does it already:

158   assert self._log_rewards.shape == (self.n_trajectories,)

josephdviviano · 2024-11-15T01:29:27Z

Yes, I agree with this. Sorry for the lag in my reply.

josephdviviano self-assigned this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring ideas for `log_rewards` #200

Refactoring ideas for `log_rewards` #200

younik commented Oct 15, 2024

hyeok9855 commented Oct 18, 2024 •

edited

Loading

josephdviviano commented Oct 18, 2024 via email

younik commented Nov 1, 2024 •

edited

Loading

josephdviviano commented Nov 5, 2024 •

edited

Loading

younik commented Nov 5, 2024

josephdviviano commented Nov 15, 2024

Refactoring ideas for log_rewards #200

Refactoring ideas for log_rewards #200

Comments

younik commented Oct 15, 2024

hyeok9855 commented Oct 18, 2024 • edited Loading

josephdviviano commented Oct 18, 2024 via email

younik commented Nov 1, 2024 • edited Loading

josephdviviano commented Nov 5, 2024 • edited Loading

younik commented Nov 5, 2024

josephdviviano commented Nov 15, 2024

Refactoring ideas for `log_rewards` #200

Refactoring ideas for `log_rewards` #200

hyeok9855 commented Oct 18, 2024 •

edited

Loading

younik commented Nov 1, 2024 •

edited

Loading

josephdviviano commented Nov 5, 2024 •

edited

Loading