Implement early stopping #28

akorgor · 2024-05-29T08:31:24Z

This PR replaces PR #2 and implements the early-stopping algorithm as described in the corresponding evidence accumulation task implemented in TensorFlow. The only difference is that here the early-stopping criterion is evaluated after each validation step ( e.g., every ten iterations) and not after every iteration as in the TensorFlow implementation. Since the early-stopping is assessed in the first instance with the newest validation value, evaluating the early-stopping for ten iterations with the same validation result as in the TensorFlow implementation seems wasteful.

Currently, the NEST losses match the TF losses for the iterations where there has not been a weight update yet; those are the first two iterations.

NEST:
0.74115255000619 validation
0.75886758496570 training
0.64575804540432 training
0.65036521347625 training
0.73954336799350 training
0.64857381599914 training
0.63293547882357 test
0.74871743652812 test
0.63259857933630 test
0.65171656508917 test

TF:
0.74115252494812 validation
0.75886762142181 training
0.64488172531128 training
0.63414341211319 training
0.74553966522217 training
0.65522724390030 training
0.62222802639008 test
0.75369793176651 test
0.63924121856689 test
0.65074861049652 test

To see if these deviations are due to an extra spike that results from numerical differences between TensorFlow and NEST, in the following experiment, a recurrent neuron (index 82) was forced to emit an extra spike at t = 4000, which is in the second iteration. This perturbation causes a deviation in the loss's 11th decimal digit.

NEST (perturbed):
0.74115255000619 validation
0.75886758494402 training
0.64732471934116 training
0.65337249687406 training
0.73455409141632 training
0.64412005064290 training
0.64022872139437 test
0.74554388366270 test
0.62969715145448 test
0.64486264066109 test

Therefore, probably the reason for the deviation between TF and NEST is that the gradients were not computed correctly.

investigate reason for deviation between TF and NEST

github-actions · 2024-08-31T08:37:15Z

Pull request automatically marked stale!

akorgor · 2024-09-22T19:30:21Z

When the do_early_stopping flag is set to False the losses of the original script without the early stopping framework are recovered.

JesusEV · 2024-09-24T09:05:27Z

...mples/eprop_plasticity/eprop_supervised_classification_evidence-accumulation_bsshslm_2020.py

+n_iter_sim = 0
+nest.Simulate(duration["total_offset"])
+
+phase_label_previous = ""
+for k_iter in range(n_iter):
+    if do_early_stopping and k_iter % n_validate_every == 0:
+        error_val, n_iter_sim, phase_label_previous = run("validation", n_iter_sim, eta_test, phase_label_previous)
+
+        if k_iter > 0 and error_val < stop_crit:
+            errors_early_stop, n_iter_sim, phase_label_previous = run(
+                "early-stopping", n_iter_sim, eta_test, phase_label_previous
+            )
+            if np.mean(errors_early_stop) < stop_crit:
+                break
+
+    run_iter = min(n_iter - k_iter, n_validate_every)
+    _, n_iter_sim, phase_label_previous = run("training", n_iter_sim, eta_train, phase_label_previous)
+
+for _ in range(n_test):
+    _, n_iter_sim, phase_label_previous = run("test", n_iter_sim, eta_test, phase_label_previous)
+
+nest.Simulate(steps["extension_sim"])


Could the loop structure be reorganized to make the phase separation clearer? Currently, the only indication is the string labels, but they don’t seem obvious enough. Would a reorganization like the following be more effective?

def simulate_training_phase(n_iter_sim, eta_train, phase_label_previous): run_iter = min(n_iter - k_iter, n_validate_every) return run("training", n_iter_sim, eta_train, phase_label_previous) def simulate_validation_phase(n_iter_sim, eta_test, phase_label_previous): return run("validation", n_iter_sim, eta_test, phase_label_previous) def simulate_early_stopping_phase(n_iter_sim, eta_test, phase_label_previous): errors_early_stop, n_iter_sim, phase_label_previous = run("early-stopping", n_iter_sim, eta_test, phase_label_previous) if np.mean(errors_early_stop) < stop_crit: return True, n_iter_sim, phase_label_previous return False, n_iter_sim, phase_label_previous def simulate_test_phase(n_test, n_iter_sim, eta_test, phase_label_previous): for _ in range(n_test): run("test", n_iter_sim, eta_test, phase_label_previous) n_iter_sim = 0 nest.Simulate(duration["total_offset"]) phase_label_previous = "" for k_iter in range(n_iter): # Validation phase and early stoping check if do_early_stopping and k_iter % n_validate_every == 0: error_val, n_iter_sim, phase_label_previous = simulate_validation_phase(n_iter_sim, eta_test, phase_label_previous) if k_iter > 0 and error_val < stop_crit: should_stop, n_iter_sim, phase_label_previous = simulate_early_stopping_phase(n_iter_sim, eta_test, phase_label_previous) if should_stop: print(f"Early stopping at iteration {k_iter}") break # Training phase n_iter_sim, phase_label_previous = simulate_training_phase(n_iter_sim, eta_train, phase_label_previous) # Test phase simulate_test_phase(n_test, n_iter_sim, eta_test, phase_label_previous)

Nice idea! Please, see 4c13272

It looks great. The compartmentalization makes it look much cleaner and more organized. I'm sure this should make maintenance and modification easier. In particular I like the new run_early_stopping function, which isolates the core early stopping logic into few, easy-to-follow lines.

Co-authored-by: JesusEV <43375826+JesusEV@users.noreply.github.com>

Changing parameter values for astrocyte_lr_1994

akorgor requested a review from JesusEV May 29, 2024 08:31

JesusEV approved these changes Jun 3, 2024

View reviewed changes

akorgor force-pushed the feat_early-stopping branch from 3a68c33 to 33a6ef8 Compare July 1, 2024 11:51

github-actions bot added the stale label Aug 31, 2024

akorgor force-pushed the feat_early-stopping branch from 33a6ef8 to 6b0abea Compare September 22, 2024 19:22

github-actions bot removed the stale label Sep 23, 2024

akorgor requested a review from JesusEV September 23, 2024 10:55

JesusEV reviewed Sep 24, 2024

View reviewed changes

akorgor and others added 6 commits September 24, 2024 18:08

Implement early-stopping

e8b1984

Enable changing of eta during experiment

72a8607

Co-authored-by: JesusEV <43375826+JesusEV@users.noreply.github.com>

Introduce eta private to a synapse

d35439e

Adjust early stopping code to upstream changes

828f1d5

Introduce a flag to turn early stopping on and off

4be5293

Modularize training pipeline

4c13272

Co-authored-by: JesusEV <43375826+JesusEV@users.noreply.github.com>

akorgor force-pushed the feat_early-stopping branch from 6b0abea to 4c13272 Compare September 25, 2024 12:30

akorgor merged commit 3b1b930 into JesusEV:eprop_bio_feature Sep 25, 2024
17 of 19 checks passed

akorgor deleted the feat_early-stopping branch September 25, 2024 14:32

akorgor pushed a commit that referenced this pull request Sep 27, 2024

Merge pull request #28 from HanjiaJiang/new_tripartite_hj

a3f9536

Changing parameter values for astrocyte_lr_1994

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement early stopping #28

Implement early stopping #28

akorgor commented May 29, 2024 •

edited

Loading

github-actions bot commented Aug 31, 2024

akorgor commented Sep 22, 2024

JesusEV Sep 24, 2024

akorgor Sep 25, 2024

JesusEV Sep 25, 2024

Implement early stopping #28

Implement early stopping #28

Conversation

akorgor commented May 29, 2024 • edited Loading

github-actions bot commented Aug 31, 2024

akorgor commented Sep 22, 2024

JesusEV Sep 24, 2024

Choose a reason for hiding this comment

akorgor Sep 25, 2024

Choose a reason for hiding this comment

JesusEV Sep 25, 2024

Choose a reason for hiding this comment

akorgor commented May 29, 2024 •

edited

Loading