Adding State Updating and # of Tasks to LSTM and RGCN models #104

jzwart · 2021-05-26T15:45:56Z

I added some functionality to the LSTM and RGCN models so that they can return the h and c states of the LSTM. The LSTM returns the most recent h and c states (i.e. most recent time step), while the RGCN returns the h and c states for every time step since it is looping through each timestep for the LSTM.

I also added options for predicting either one or two tasks for the LSTM and RGCN. Previously, the models only predicted two tasks (i.e. you had to supply observations for two target features), but now both models should be able to predict either one or two tasks while defaulting to predicting only one tasks. This will be a big update and I didn't change train.py or other relevant functions that will call the LSTM/RGCN models because I think we need to first see if these proposed changes will work for all the projects. I also wasn't sure what to call the number of prediction tasks - I ended up calling them tasks since that is what @jsadler2 refers to them in his multi-task modeling paper, but I'm open to other suggestions (e.g. targets, target_features, etc..). I had to update some of the loss functions too since they defaulted to assuming the model was predicting 2 modeling tasks.

Here's an example of running these models with the new updates. This example is just predicting a single task and returning the h and c states. I show that both the LSTM and RGCN h and c states can be adjusted, updated, and influence the model predictions when supplied to the LSTM / RGCN as initial h and c states. This workflow also works for 2 tasks but it isn't shown right now.

closes #98, #106

aappling-usgs

Hi Jake, super cool to see these updates going into the shared river-dl codebase. I've made comments that reflect on decisions made by both you and Jeff, plus a few changes to Jeff's original code to maintain readability as the scope of this repo expands.

My one other major comment is about the interfaces you've designed here. I see this block in your ipynb:

old_preds_lstm = model_lstm(inputs)
old_preds_rgcn = model_rgcn(inputs)
h_lstm, c_lstm = model_lstm.rnn_layer.states
h_rgcn = model_rgcn.h_gr
c_rgcn = model_rgcn.c_gr

updated_h_lstm = h_lstm * 500 # update just the states of just the last time step
updated_c_lstm = c_lstm * 500
updated_h_rgcn = h_rgcn[:,-1,:] * 500  # rgcn returns entire timeseries of h & c states so we need to select most recent states 
updated_c_rgcn = c_rgcn[:,-1,:] * 500 

# initialize the states with the previously trained states 
model_lstm.rnn_layer.reset_states(states=[updated_h_lstm, updated_c_lstm])
new_preds_lstm = model_lstm(inputs) 
new_preds_rgcn = model_rgcn(inputs, h_init = updated_h_rgcn, c_init = updated_c_rgcn) 
new_h_lstm, new_c_lstm = model_lstm.rnn_layer.states
new_h_rgcn = model_rgcn.h_gr
new_c_rgcn = model_rgcn.c_gr

I see these differences in interfaces:

To access or reset states in the LSTM, you accept or pass in a list, whereas to access or reset states in the RGCN, you accept or pass in two objects.
RGCN returns the entire timeseries whereas LSTM returns just the most recent states; is there a need to preserve all values in the RGCN, or could that model be changed to store/change just the most recent state as well?
You reset states in LSTM by calling the built-in reset_states method. Could you write such a method for RGCN as well rather than adding arguments to model_rgcn?

Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.

river_dl/RGCN.py

aappling-usgs · 2021-05-26T18:19:16Z

river_dl/RGCN.py

        """

        :param hidden_size: [int] the number of hidden units
        :param A: [numpy array] adjacency matrix
+        :param tasks: [int] number of prediction tasks to perform - currently supports either 1 or 2 prediction tasks 
+        :param dropout: [float] value between 0 and 1 for the probability of a reccurent element to be zero  
        :param flow_in_temp: [bool] whether the flow predictions should feed


This parameter name suggests it'd be hard to generalize to other variables. Prod mostly to @jsadler2 to think about whether/how this parameter name and/or functionality should be adjusted to accommodate, say, DO predictions.

I think this should be moved to the model level, not the layer level because we will be separating out the output layer from the RGCN layer.

river_dl/RGCN.py

river_dl/rnns.py

river_dl/loss_functions.py

river_dl/rnns.py

river_dl/RGCN.py

jsadler2

@jzwart - like Alison said, great to see these additions to the shared codebase. I'm wondering if we should take what you've started and try to make it even more modular.

have the RGCN layer return the h's and c's - I think this makes a lot of sense. As it is written now, the RGCN and the output "layers" are both in one layer class. If we just return the h's and c's we can swap out the output layer without changing the actual RGCN code. This will be useful for if we want to expand to say 3 outputs later on or if we want the model to predict the probability distribution.
multi-task loss functions - I think we should rename and rewrite the loss functions so it is clear they are multi-task loss functions. These functions will take the a 1-d array of lambdas, one for each output variable. That way we don't have to explicitly have a different function for 1, 2, 3, ... variables.
separate SingleTaskLSTMModel - I think it might make sense to have a separate SingleTaskLSTMModel class. It would be a very simple class, but I think it could make the code more maintainable/clear because we wouldn't have to write in all this logic about lambda, num_tasks, etc. That logic I guess would be upstream. If someone was running just a single-task model, they'd just use the SingleTaskLSTMModel and not the other one (maybe we call that one a MultiTaskLSTMModel).
(something to think about) One multi-task output layer - In line with simplifying, I wonder if we could just have all of the "tasks" share just one output layer. This would really simplify the model definition. We could just have pass (or infer) the number of outputs and then the output layer would output that number. That would preclude the fancy gradient correction routine, but I'm not sure we need that (at least in the main branch).

If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.

jsadler2 · 2021-06-02T16:45:06Z

river_dl/RGCN.py

+                # was b2
+                self.b_out = self.add_weight(
+                    shape=[1], initializer="zeros", name="b_out"
+                )

    @tf.function
    def call(self, inputs, **kwargs):


We probably need to add a docstring to this function so that we are clear on what the arguments are.

river_dl/RGCN.py

river_dl/loss_functions.py

river_dl/RGCN.py

river_dl/rnns.py

jzwart · 2021-06-02T18:55:36Z

Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.

Yes, I think that's a good idea and will update so that they have similar interfaces.

If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.

That'd be great @jsadler2 , I will add a few other commits based on @aappling-usgs and your comments as well.

Co-authored-by: Alison Appling <aappling@usgs.gov>

no magic numbers Co-authored-by: Alison Appling <aappling@usgs.gov>

Co-authored-by: Alison Appling <aappling@usgs.gov>

into adding_da_to_models

jzwart · 2021-06-02T19:38:22Z

OK @jsadler2 , I've addressed most but not all comments. If you want to commit code to this branch addressing your points 1, 2, and 3 above, then go ahead. I won't commit code for a bit.

jsadler2 · 2021-06-02T22:28:13Z

Okay @jzwart - I added a few commits that get at my 1, 2, and 3 points from above. I'm realizing that there is some more simplifying that I think can/should be done here, but I'm not sure if it belongs in this PR. For example, taking out the sample weights relates to the loss functions (#98), but that may be beyond scope here.

Co-authored-by: Alison Appling <aappling@usgs.gov>

river_dl/rnns.py

river_dl/RGCN.py

jzwart · 2021-06-03T16:01:01Z

looks good @jsadler2 , I added a couple comments

… adding_da_to_models

river_dl/RGCN.py

(just add losses together for multitask)

this check throws error and is handled upstream

river_dl/predict.py

jsadler2 · 2021-06-04T19:42:19Z

@jzwart - I got a little carried away :). I felt like just one thing led to another and I ended doing a couple more major changes that related to what you started.

So now in addition to that you wrote in, i.e.,:

the states attribute in both the RGCNModel and the LSTMModel being exposed
the num_tasks argument

I also

removed the use of weights in the loss functions and took that out of the train script (consider taking out sample weights #98)
unified the multitasking approach so that now the LSTMModel and the GRUModel are much simpler (unifying multi-task implementation #106)

These two additional changes were mostly prompted by the num_tasks option that you introduced.

I think all the changes together will make the codebase both simpler and more flexible.

I'm done making changes, now, so if you could look over what I've done, that'd be great.

jzwart · 2021-06-07T13:39:18Z

river_dl/train_model.py

    default="rgcn",
 )
 parser.add_argument(
-    "--lamb", help="lambda for weighting aux gradient", default=1.0, type=float
+    "--num_tasks",
+    help="number of tasks (outputs to be predicted)",


Suggested change

help="number of tasks (outputs to be predicted)",

help="number of tasks (variables to be predicted)",

jzwart · 2021-06-07T13:53:05Z

I think it looks good @jsadler2 ! I like the unified approach to all the models and similar output for predictions and states. There is a slight difference in how the LSTM and RGCN states are reset and I wonder if we could make them the same. For the RGCN we'd reset by supplying the h and c states:

rgcn_preds = rgcn_model(inputs = inputs, h_init = h, c_init = c)

but the LSTM states would need to be reset before making predictions rather than supplying to the function:

lstm_model.rnn_layer.reset_states(states = [h, c]) 
lstm_preds = lstm_model(inputs = inputs)

It'd be nice to make those the same so we don't have to change the prediction / training code as much depending on the model type we're using. I think we'd just need to add an h_init and c_init argument to the LSTMModel call and reset the states in there (either to zero or the supplied h and c states). I'll add a suggestion where I think it would go.

Other than that, I think the changes look good and it can be merged

jzwart · 2021-06-07T13:58:21Z

river_dl/rnns.py

-        self.gradient_correction = gradient_correction
-        self.grad_log_file = grad_log_file
-        self.lamb = lamb
+        self.num_tasks = num_tasks


need to add:

self.hidden_size = hidden_size

if doing the suggested h & c state reset below

jzwart · 2021-06-07T14:00:23Z

river_dl/rnns.py

-            # adjust auxiliary gradient
-            gradient_shared_aux = adjust_gradient_list(
-                gradient_shared_main, gradient_shared_aux, self.grad_log_file
+        x, h, c = self.rnn_layer(inputs)


I propose adding in the following right above this line to make the reset of the h and c states more similar to RGCN model:

batch_size = inputs.shape[0] h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size])) c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size])) self.rnn_layer.reset_states(states = [h_init, c_init])

I think this should work?

jsadler2 · 2021-06-07T18:49:17Z

Back to you, @jzwart. I'm good with this merge now if you are.

jzwart · 2021-06-07T18:58:50Z

great, looks good to me. Merging

Jacob Zwart added 3 commits May 25, 2021 11:57

adding DA capabilities to LSTM

94d0454

updating loss functions and adding tasks to RGCN

b9f5393

updating parameter definitions

a1ff04f

jzwart requested review from jsadler2 and aappling-usgs May 26, 2021 15:46

aappling-usgs reviewed May 27, 2021

View reviewed changes

jsadler2 reviewed Jun 2, 2021

View reviewed changes

river_dl/rnns.py Outdated Show resolved Hide resolved

updating dropout argument names

b7e9046

jzwart and others added 7 commits June 2, 2021 15:04

cleaning LSTM

494b931

Co-authored-by: Alison Appling <aappling@usgs.gov>

Being explicit about the number of tasks

b3f8df9

no magic numbers Co-authored-by: Alison Appling <aappling@usgs.gov>

cleaning up init states call

34e9983

Co-authored-by: Alison Appling <aappling@usgs.gov>

Updating rgcn for clarity

10ffdbd

Merge branch 'adding_da_to_models' of https://github.com/USGS-R/river-dl

c97f179

into adding_da_to_models

updating lambda_aux argument

cf95324

making stateful based on return_state argument

088d4a7

jsadler2 added 4 commits June 2, 2021 16:34

separating output, rgcn layers

4101879

making a more generic multitask loss function

6fc687d

adding SingletaskLSTMModel

7c36a4f

Convert to MultitaskLSTM; update GRU classes

c582d3d

explicit about number of tasks for y_data_components

ab4cff1

Co-authored-by: Alison Appling <aappling@usgs.gov>

jzwart commented Jun 3, 2021

View reviewed changes

river_dl/rnns.py Show resolved Hide resolved

jzwart commented Jun 3, 2021

View reviewed changes

river_dl/RGCN.py Outdated Show resolved Hide resolved

jsadler2 added 3 commits June 4, 2021 10:44

convenience fxn weighted_masked_rmse

3129bd5

lambda_aux -> lambdas in rnns

c8c140a

num_tasks, lambdas in train functions

3e54465

Merge branch 'adding_da_to_models' of github.com:USGS-R/river-dl into…

e939460

… adding_da_to_models

jsadler2 reviewed Jun 4, 2021

View reviewed changes

river_dl/RGCN.py Outdated Show resolved Hide resolved

jsadler2 added 9 commits June 4, 2021 11:51

[#106] taking train_step out in rnns

17bfab8

(just add losses together for multitask)

[#106] provide loss_func to train func; compiles rnns

2c7580b

[#98] multitask nse, kge functions; rm weights

3799509

[#106] match train cli with train.py fxn

a4d54e2

take out unneeded check on h_/c_init in RGCN

0f568ae

this check throws error and is handled upstream

[#98] don't pass weights to fit call

6cd7b52

add num_tasks to predict fxns

9bc45ca

Snakefile updates for lambdas, num_tasks, loss_func

63bb515

RGCN states attribute; just final states

8c80003

jsadler2 reviewed Jun 4, 2021

View reviewed changes

river_dl/predict.py Outdated Show resolved Hide resolved

jsadler2 added 3 commits June 4, 2021 14:13

typo in predict

3eca6f2

Black formatting and docstring corrections

4527d67

attr for rnns

8d061f8

jzwart commented Jun 7, 2021

View reviewed changes

jsadler2 added 2 commits June 7, 2021 13:40

"outputs" -> "variables" in num_task docstring

1c1641a

can provide h_/c_init to initalize rnn

8362981

jsadler2 mentioned this pull request Jun 7, 2021

consider taking out sample weights #98

Open

jzwart merged commit ec2d9b9 into master Jun 7, 2021

jsadler2 linked an issue Jun 7, 2021 that may be closed by this pull request

unifying multi-task implementation #106

Closed

jzwart deleted the adding_da_to_models branch June 7, 2021 18:59

janetrbarclay added a commit to janetrbarclay/river-dl that referenced this pull request Jun 16, 2021

merging in USGS-R#104

bde5db6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding State Updating and # of Tasks to LSTM and RGCN models #104

Adding State Updating and # of Tasks to LSTM and RGCN models #104

jzwart commented May 26, 2021 •

edited by jsadler2

Loading

aappling-usgs left a comment •

edited

Loading

aappling-usgs May 26, 2021

jsadler2 Jun 2, 2021

jsadler2 left a comment

jsadler2 Jun 2, 2021

jzwart commented Jun 2, 2021

jzwart commented Jun 2, 2021

jsadler2 commented Jun 2, 2021

jzwart commented Jun 3, 2021

jsadler2 commented Jun 4, 2021

jzwart Jun 7, 2021

jzwart commented Jun 7, 2021

jzwart Jun 7, 2021

jzwart Jun 7, 2021

jsadler2 Jun 7, 2021

jsadler2 commented Jun 7, 2021

jzwart commented Jun 7, 2021

	help="number of tasks (outputs to be predicted)",
	help="number of tasks (variables to be predicted)",

Adding State Updating and # of Tasks to LSTM and RGCN models #104

Adding State Updating and # of Tasks to LSTM and RGCN models #104

Conversation

jzwart commented May 26, 2021 • edited by jsadler2 Loading

aappling-usgs left a comment • edited Loading

Choose a reason for hiding this comment

aappling-usgs May 26, 2021

Choose a reason for hiding this comment

jsadler2 Jun 2, 2021

Choose a reason for hiding this comment

jsadler2 left a comment

Choose a reason for hiding this comment

jsadler2 Jun 2, 2021

Choose a reason for hiding this comment

jzwart commented Jun 2, 2021

jzwart commented Jun 2, 2021

jsadler2 commented Jun 2, 2021

jzwart commented Jun 3, 2021

jsadler2 commented Jun 4, 2021

jzwart Jun 7, 2021

Choose a reason for hiding this comment

jzwart commented Jun 7, 2021

jzwart Jun 7, 2021

Choose a reason for hiding this comment

jzwart Jun 7, 2021

Choose a reason for hiding this comment

jsadler2 Jun 7, 2021

Choose a reason for hiding this comment

jsadler2 commented Jun 7, 2021

jzwart commented Jun 7, 2021

jzwart commented May 26, 2021 •

edited by jsadler2

Loading

aappling-usgs left a comment •

edited

Loading