Skip to content
This repository has been archived by the owner on Jun 2, 2023. It is now read-only.

Adding State Updating and # of Tasks to LSTM and RGCN models #104

Merged
merged 34 commits into from
Jun 7, 2021

Conversation

jzwart
Copy link
Member

@jzwart jzwart commented May 26, 2021

I added some functionality to the LSTM and RGCN models so that they can return the h and c states of the LSTM. The LSTM returns the most recent h and c states (i.e. most recent time step), while the RGCN returns the h and c states for every time step since it is looping through each timestep for the LSTM.

I also added options for predicting either one or two tasks for the LSTM and RGCN. Previously, the models only predicted two tasks (i.e. you had to supply observations for two target features), but now both models should be able to predict either one or two tasks while defaulting to predicting only one tasks. This will be a big update and I didn't change train.py or other relevant functions that will call the LSTM/RGCN models because I think we need to first see if these proposed changes will work for all the projects. I also wasn't sure what to call the number of prediction tasks - I ended up calling them tasks since that is what @jsadler2 refers to them in his multi-task modeling paper, but I'm open to other suggestions (e.g. targets, target_features, etc..). I had to update some of the loss functions too since they defaulted to assuming the model was predicting 2 modeling tasks.

Here's an example of running these models with the new updates. This example is just predicting a single task and returning the h and c states. I show that both the LSTM and RGCN h and c states can be adjusted, updated, and influence the model predictions when supplied to the LSTM / RGCN as initial h and c states. This workflow also works for 2 tasks but it isn't shown right now.

closes #98, #106

Copy link
Member

@aappling-usgs aappling-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jake, super cool to see these updates going into the shared river-dl codebase. I've made comments that reflect on decisions made by both you and Jeff, plus a few changes to Jeff's original code to maintain readability as the scope of this repo expands.

My one other major comment is about the interfaces you've designed here. I see this block in your ipynb:

old_preds_lstm = model_lstm(inputs)
old_preds_rgcn = model_rgcn(inputs)
h_lstm, c_lstm = model_lstm.rnn_layer.states
h_rgcn = model_rgcn.h_gr
c_rgcn = model_rgcn.c_gr

updated_h_lstm = h_lstm * 500 # update just the states of just the last time step
updated_c_lstm = c_lstm * 500
updated_h_rgcn = h_rgcn[:,-1,:] * 500  # rgcn returns entire timeseries of h & c states so we need to select most recent states 
updated_c_rgcn = c_rgcn[:,-1,:] * 500 

# initialize the states with the previously trained states 
model_lstm.rnn_layer.reset_states(states=[updated_h_lstm, updated_c_lstm])
new_preds_lstm = model_lstm(inputs) 
new_preds_rgcn = model_rgcn(inputs, h_init = updated_h_rgcn, c_init = updated_c_rgcn) 
new_h_lstm, new_c_lstm = model_lstm.rnn_layer.states
new_h_rgcn = model_rgcn.h_gr
new_c_rgcn = model_rgcn.c_gr

I see these differences in interfaces:

  1. To access or reset states in the LSTM, you accept or pass in a list, whereas to access or reset states in the RGCN, you accept or pass in two objects.
  2. RGCN returns the entire timeseries whereas LSTM returns just the most recent states; is there a need to preserve all values in the RGCN, or could that model be changed to store/change just the most recent state as well?
  3. You reset states in LSTM by calling the built-in reset_states method. Could you write such a method for RGCN as well rather than adding arguments to model_rgcn?

Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.

river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/RGCN.py Outdated
"""

:param hidden_size: [int] the number of hidden units
:param A: [numpy array] adjacency matrix
:param tasks: [int] number of prediction tasks to perform - currently supports either 1 or 2 prediction tasks
:param dropout: [float] value between 0 and 1 for the probability of a reccurent element to be zero
:param flow_in_temp: [bool] whether the flow predictions should feed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter name suggests it'd be hard to generalize to other variables. Prod mostly to @jsadler2 to think about whether/how this parameter name and/or functionality should be adjusted to accommodate, say, DO predictions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to the model level, not the layer level because we will be separating out the output layer from the RGCN layer.

river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/rnns.py Show resolved Hide resolved
river_dl/rnns.py Outdated Show resolved Hide resolved
river_dl/loss_functions.py Outdated Show resolved Hide resolved
river_dl/rnns.py Outdated Show resolved Hide resolved
river_dl/RGCN.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jsadler2 jsadler2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzwart - like Alison said, great to see these additions to the shared codebase. I'm wondering if we should take what you've started and try to make it even more modular.

  1. have the RGCN layer return the h's and c's - I think this makes a lot of sense. As it is written now, the RGCN and the output "layers" are both in one layer class. If we just return the h's and c's we can swap out the output layer without changing the actual RGCN code. This will be useful for if we want to expand to say 3 outputs later on or if we want the model to predict the probability distribution.
  2. multi-task loss functions - I think we should rename and rewrite the loss functions so it is clear they are multi-task loss functions. These functions will take the a 1-d array of lambdas, one for each output variable. That way we don't have to explicitly have a different function for 1, 2, 3, ... variables.
  3. separate SingleTaskLSTMModel - I think it might make sense to have a separate SingleTaskLSTMModel class. It would be a very simple class, but I think it could make the code more maintainable/clear because we wouldn't have to write in all this logic about lambda, num_tasks, etc. That logic I guess would be upstream. If someone was running just a single-task model, they'd just use the SingleTaskLSTMModel and not the other one (maybe we call that one a MultiTaskLSTMModel).
  4. (something to think about) One multi-task output layer - In line with simplifying, I wonder if we could just have all of the "tasks" share just one output layer. This would really simplify the model definition. We could just have pass (or infer) the number of outputs and then the output layer would output that number. That would preclude the fancy gradient correction routine, but I'm not sure we need that (at least in the main branch).

If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.

# was b2
self.b_out = self.add_weight(
shape=[1], initializer="zeros", name="b_out"
)

@tf.function
def call(self, inputs, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to add a docstring to this function so that we are clear on what the arguments are.

river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/loss_functions.py Outdated Show resolved Hide resolved
river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/rnns.py Outdated Show resolved Hide resolved
@jzwart
Copy link
Member Author

jzwart commented Jun 2, 2021

Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.

Yes, I think that's a good idea and will update so that they have similar interfaces.

If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.

That'd be great @jsadler2 , I will add a few other commits based on @aappling-usgs and your comments as well.

jzwart and others added 7 commits June 2, 2021 15:04
Co-authored-by: Alison Appling <aappling@usgs.gov>
no magic numbers

Co-authored-by: Alison Appling <aappling@usgs.gov>
Co-authored-by: Alison Appling <aappling@usgs.gov>
@jzwart
Copy link
Member Author

jzwart commented Jun 2, 2021

OK @jsadler2 , I've addressed most but not all comments. If you want to commit code to this branch addressing your points 1, 2, and 3 above, then go ahead. I won't commit code for a bit.

@jsadler2
Copy link
Collaborator

jsadler2 commented Jun 2, 2021

Okay @jzwart - I added a few commits that get at my 1, 2, and 3 points from above. I'm realizing that there is some more simplifying that I think can/should be done here, but I'm not sure if it belongs in this PR. For example, taking out the sample weights relates to the loss functions (#98), but that may be beyond scope here.

Co-authored-by: Alison Appling <aappling@usgs.gov>
river_dl/rnns.py Show resolved Hide resolved
river_dl/RGCN.py Outdated Show resolved Hide resolved
@jzwart
Copy link
Member Author

jzwart commented Jun 3, 2021

looks good @jsadler2 , I added a couple comments

river_dl/RGCN.py Outdated Show resolved Hide resolved
river_dl/predict.py Outdated Show resolved Hide resolved
@jsadler2
Copy link
Collaborator

jsadler2 commented Jun 4, 2021

@jzwart - I got a little carried away :). I felt like just one thing led to another and I ended doing a couple more major changes that related to what you started.

So now in addition to that you wrote in, i.e.,:

  • the states attribute in both the RGCNModel and the LSTMModel being exposed
  • the num_tasks argument

I also

These two additional changes were mostly prompted by the num_tasks option that you introduced.

I think all the changes together will make the codebase both simpler and more flexible.

I'm done making changes, now, so if you could look over what I've done, that'd be great.

default="rgcn",
)
parser.add_argument(
"--lamb", help="lambda for weighting aux gradient", default=1.0, type=float
"--num_tasks",
help="number of tasks (outputs to be predicted)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="number of tasks (outputs to be predicted)",
help="number of tasks (variables to be predicted)",

@jzwart
Copy link
Member Author

jzwart commented Jun 7, 2021

I think it looks good @jsadler2 ! I like the unified approach to all the models and similar output for predictions and states. There is a slight difference in how the LSTM and RGCN states are reset and I wonder if we could make them the same. For the RGCN we'd reset by supplying the h and c states:

rgcn_preds = rgcn_model(inputs = inputs, h_init = h, c_init = c) 

but the LSTM states would need to be reset before making predictions rather than supplying to the function:

lstm_model.rnn_layer.reset_states(states = [h, c]) 
lstm_preds = lstm_model(inputs = inputs) 

It'd be nice to make those the same so we don't have to change the prediction / training code as much depending on the model type we're using. I think we'd just need to add an h_init and c_init argument to the LSTMModel call and reset the states in there (either to zero or the supplied h and c states). I'll add a suggestion where I think it would go.

Other than that, I think the changes look good and it can be merged

self.gradient_correction = gradient_correction
self.grad_log_file = grad_log_file
self.lamb = lamb
self.num_tasks = num_tasks
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add:

self.hidden_size = hidden_size 

if doing the suggested h & c state reset below

# adjust auxiliary gradient
gradient_shared_aux = adjust_gradient_list(
gradient_shared_main, gradient_shared_aux, self.grad_log_file
x, h, c = self.rnn_layer(inputs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose adding in the following right above this line to make the reset of the h and c states more similar to RGCN model:

batch_size = inputs.shape[0]
h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size]))
c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size]))
self.rnn_layer.reset_states(states = [h_init, c_init]) 

I think this should work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@jsadler2
Copy link
Collaborator

jsadler2 commented Jun 7, 2021

Back to you, @jzwart. I'm good with this merge now if you are.

@jzwart
Copy link
Member Author

jzwart commented Jun 7, 2021

great, looks good to me. Merging

@jzwart jzwart merged commit ec2d9b9 into master Jun 7, 2021
@jsadler2 jsadler2 linked an issue Jun 7, 2021 that may be closed by this pull request
@jzwart jzwart deleted the adding_da_to_models branch June 7, 2021 18:59
janetrbarclay added a commit to janetrbarclay/river-dl that referenced this pull request Jun 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consider taking out sample weights unifying multi-task implementation
3 participants