Skip to content

Commit

Permalink
feat(app): 1st converging cloud microphysics model
Browse files Browse the repository at this point in the history
This commit exhibits nearly monotic convergence as measured by the
cost function decreasing 3 orders of magnitude in the first 120
iterations.  To reproduce this behavior, execute

./build/run-fpm.sh run -- --base training --epochs 120 --stride 720

with the present working directory containing the 29.7 GB
training_input.nc and training_output.nc produced for the
"Colorado benchmark simulation" using commit d7aa958 on the
neural-net branch of https://github.com/berkeleylab/icar,
which uses the simplest of ICAR's cloud microphysics models.
The Inference-Engine run uses

* A single time instant (as determined by the above stride),
* A 30% retention rate of grid points where time derivatives vanish,
* Zero initial weights and biases,
* A batch size equal to the entire time instant,
* Gradient descent with no optimizer, and
* A single mini-batch.

The program shuffles the data set in order to facilitate
stochastic gradient descent. However, because a single mini-batch
is used, the cost function is computed across the entire data set,
which negates the value of shuffling and thus presumably makes this
gradient descent.

Because a single time instant is used, this case reflects the
behavior that might be expected if Inference-Engine is integrated
into ICAR and training happens during an ICAR run.  In such a
scenario, it might be desirable to iterate on each time instant
as soon as the time step completes.  Doing so might either help
to pretrain the network to promote faster convergence if the
data is saved for additional subsequent after the ICAR run.
Alternatively, training at ICAR runtime might obviate the need
for saving large training data sets.
  • Loading branch information
rouson committed Sep 11, 2023
1 parent cb5119f commit b09e77c
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions app/train-cloud-microphysics.f90
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ subroutine read_train_write
else
close(network_unit)
print *,"Initializing a new network"
trainable_engine = new_engine(num_hidden_layers=12, nodes_per_hidden_layer=16, num_inputs=8, num_outputs=6, random=.true.)
trainable_engine = new_engine(num_hidden_layers=12, nodes_per_hidden_layer=16, num_inputs=8, num_outputs=6, random=.false.)
end if

print *,"Defining tensors from time steps 1 through", t_end, "with strides of", stride
Expand Down Expand Up @@ -229,7 +229,7 @@ subroutine read_train_write
end associate


associate(num_pairs => size(input_output_pairs), n_bins => size(input_output_pairs)/10000)
associate(num_pairs => size(input_output_pairs), n_bins => 1) ! also tried n_bins => size(input_output_pairs)/10000
bins = [(bin_t(num_items=num_pairs, num_bins=n_bins, bin_number=b), b = 1, n_bins)]

print *,"Training network"
Expand Down

0 comments on commit b09e77c

Please sign in to comment.