Move user docs on surrogate gradient to eprop_iaf and include elsewhere

JesusEV · Sep 11, 2024 · 757b064 · 757b064
1 parent 60a80ee
commit 757b064
Show file tree

Hide file tree

Showing 5 changed files with 77 additions and 84 deletions.
diff --git a/models/eprop_iaf.h b/models/eprop_iaf.h
@@ -103,8 +103,39 @@ voltage :math:`\psi_j^{t-1}` (the product of which forms the eligibility
 trace :math:`e_{ji}^{t-1}`), and the learning signal :math:`L_j^t` emitted
 by the readout neurons.
 
-See the documentation on the :doc:`eprop_archiving_node<../models/eprop_archiving_node/>` for details on the surrogate
-gradients functions.
+.. start_surrogate-gradient-functions
+
+Surrogate gradients help overcome the challenge of the spiking function's
+non-differentiability, facilitating the use of gradient-based learning
+techniques such as e-prop. The non-existent derivative of the spiking
+variable with respect to the membrane voltage,
+:math:`\frac{\partial z^t_j}{ \partial v^t_j}`, can be effectively
+replaced with a variety of surrogate gradient functions, as detailed in
+various studies (see, e.g., [3]_). NEST currently provides four
+different surrogate gradient functions:
+
+1. A piecewise linear function used among others in [1]_:
+
+.. math::
+  \psi_j^t = \frac{ \gamma }{ v_\text{th} } \text{max}
+    \left( 0, 1-\beta \left| \frac{ v_j^t - v_\text{th} }{ v_\text{th} }\right| \right) \,. \\
+
+2. An exponential function used in [4]_:
+
+.. math::
+  \psi_j^t = \gamma \exp \left( -\beta \left| v_j^t - v_\text{th} \right| \right) \,. \\
+
+3. The derivative of a fast sigmoid function used in [5]_:
+
+.. math::
+  \psi_j^t = \gamma \left( 1 + \beta \left| v_j^t - v_\text{th} \right| \right)^2 \,. \\
+
+4. An arctan function used in [6]_:
+
+.. math::
+  \psi_j^t = \frac{\gamma}{\pi} \frac{1}{ 1 + \left( \beta \pi \left( v_j^t - v_\text{th} \right) \right)^2 } \,. \\
+
+.. end_surrogate-gradient-functions
 
 In the interval between two presynaptic spikes, the gradient is calculated
 at each time step until the cutoff time point. This computation occurs over
@@ -272,6 +303,27 @@ References
        van Albada SJ, Plesser HE, Bolten M, Diesmann M. Event-based
        implementation of eligibility propagation (in preparation)
 
+.. start_surrogate-gradient-references
+
+.. [3] Neftci EO, Mostafa H, Zenke F (2019). Surrogate Gradient Learning in
+       Spiking Neural Networks. IEEE Signal Processing Magazine, 36(6), 51-63.
+       https://doi.org/10.1109/MSP.2019.2931595
+
+.. [4] Shrestha SB, Orchard G (2018). SLAYER: Spike Layer Error Reassignment in
+       Time. Advances in Neural Information Processing Systems, 31:1412-1421.
+       https://proceedings.neurips.cc/paper_files/paper/2018/hash/82.. rubric:: References
+
+.. [5] Zenke F, Ganguli S (2018). SuperSpike: Supervised Learning in Multilayer
+       Spiking Neural Networks. Neural Computation, 30:1514–1541.
+       https://doi.org/10.1162/neco_a_01086
+
+.. [6] Fang W, Yu Z, Chen Y, Huang T, Masquelier T, Tian Y (2021). Deep residual
+       learning in spiking neural networks. Advances in Neural Information
+       Processing Systems, 34:21056–21069.
+       https://proceedings.neurips.cc/paper/2021/hash/afe434653a898da20044041262b3ac74-Abstract.html
+
+.. end_surrogate-gradient-references
+
 Sends
 +++++
 

diff --git a/models/eprop_iaf_adapt.h b/models/eprop_iaf_adapt.h
@@ -110,8 +110,9 @@ voltage :math:`\psi_j^{t-1}` (the product of which forms the eligibility
 trace :math:`e_{ji}^{t-1}`), and the learning signal :math:`L_j^t` emitted
 by the readout neurons.
 
-See the documentation on the :doc:`eprop_archiving_node<../models/eprop_archiving_node/>` for details on the surrogate
-gradients functions.
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-functions
+   :end-before: .. end_surrogate-gradient-functions
 
 In the interval between two presynaptic spikes, the gradient is calculated
 at each time step until the cutoff time point. This computation occurs over
@@ -287,6 +288,10 @@ References
        van Albada SJ, Plesser HE, Bolten M, Diesmann M. Event-based
        implementation of eligibility propagation (in preparation)
 
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-references
+   :end-before: .. end_surrogate-gradient-references
+
 Sends
 +++++
 

diff --git a/models/eprop_iaf_adapt_bsshslm_2020.h b/models/eprop_iaf_adapt_bsshslm_2020.h
@@ -113,15 +113,16 @@ voltage :math:`\psi_j^t` (the product of which forms the eligibility
 trace :math:`e_{ji}^t`), and the learning signal :math:`L_j^t` emitted
 by the readout neurons.
 
-See the documentation on the :doc:`eprop_archiving_node<../models/eprop_archiving_node/>` for details on the surrogate
-gradients functions.
-
 .. math::
   \frac{ \text{d} E }{ \text{d} W_{ji} } &= \sum_t L_j^t \bar{e}_{ji}^t \,, \\
   e_{ji}^t &= \psi_j^t \left( \bar{z}_i^{t-1} - \beta \epsilon_{ji,a}^{t-1} \right) \,, \\
   \epsilon^{t-1}_{ji,\text{a}} &= \psi_j^{t-1} \bar{z}_i^{t-2} + \left( \rho - \psi_j^{t-1} \beta \right)
     \epsilon^{t-2}_{ji,a} \,. \\
 
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-functions
+   :end-before: .. end_surrogate-gradient-functions
+
 The eligibility trace and the presynaptic spike trains are low-pass filtered
 with the following exponential kernels:
 
@@ -257,6 +258,10 @@ References
        van Albada SJ, Plesser HE, Bolten M, Diesmann M. Event-based
        implementation of eligibility propagation (in preparation)
 
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-references
+   :end-before: .. end_surrogate-gradient-references
+
 Sends
 +++++
 

diff --git a/models/eprop_iaf_bsshslm_2020.h b/models/eprop_iaf_bsshslm_2020.h
@@ -106,13 +106,14 @@ voltage :math:`\psi_j^t` (the product of which forms the eligibility
 trace :math:`e_{ji}^t`), and the learning signal :math:`L_j^t` emitted
 by the readout neurons.
 
-See the documentation on the :doc:`eprop_archiving_node<../models/eprop_archiving_node/>` for details on the surrogate
-gradients functions.
-
 .. math::
   \frac{ \text{d} E }{ \text{d} W_{ji} } &= \sum_t L_j^t \bar{e}_{ji}^t \,, \\
    e_{ji}^t &= \psi^t_j \bar{z}_i^{t-1} \,, \\
 
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-functions
+   :end-before: .. end_surrogate-gradient-functions
+
 The eligibility trace and the presynaptic spike trains are low-pass filtered
 with the following exponential kernels:
 
@@ -242,6 +243,10 @@ References
        van Albada SJ, Plesser HE, Bolten M, Diesmann M. Event-based
        implementation of eligibility propagation (in preparation)
 
+.. include:: ../models/eprop_iaf.rst
+   :start-after: .. start_surrogate-gradient-references
+   :end-before: .. end_surrogate-gradient-references
+
 Sends
 +++++
 

diff --git a/nestkernel/eprop_archiving_node.h b/nestkernel/eprop_archiving_node.h
@@ -34,80 +34,6 @@
 
 namespace nest
 {
-/* BeginUserDocs:  e-prop plasticity
-
-Short description
-+++++++++++++++++
-Archiving node managing the history of e-prop variables.
-
-Description
-+++++++++++
-The archiving node comprises a set of functions needed for writing the e-prop
-values of the e-prop variables to history and retrieving them, as well as
-functions to compute, for example, the firing rate regularization and the
-surrogate gradient.
-
-Surrogate gradient functions
-++++++++++++++++++++++++++++
-
-Surrogate gradients help overcome the challenge of the spiking function's
-non-differentiability, facilitating the use of gradient-based learning
-techniques such as e-prop. The non-existent derivative of the spiking
-variable with respect to the membrane voltage,
-:math:`\frac{\partial z^t_j}{ \partial v^t_j}`, can be effectively
-replaced with a variety of surrogate gradient functions, as detailed in
-various studies (see, e.g., [1]_). NEST currently provides four
-different surrogate gradient functions:
-
-1. A piecewise linear function used among others in [2]_:
-
-.. math::
-  \psi_j^t = \frac{ \gamma }{ v_\text{th} } \text{max}
-    \left( 0, 1-\beta \left| \frac{ v_j^t - v_\text{th} }{ v_\text{th} }\right| \right) \,. \\
-
-2. An exponential function used in [3]_:
-
-.. math::
-  \psi_j^t = \gamma \exp \left( -\beta \left| v_j^t - v_\text{th} \right| \right) \,. \\
-
-3. The derivative of a fast sigmoid function used in [4]_:
-
-.. math::
-  \psi_j^t = \gamma \left( 1 + \beta \left| v_j^t - v_\text{th} \right| \right)^2 \,. \\
-
-4. An arctan function used in [5]_:
-
-.. math::
-  \psi_j^t = \frac{\gamma}{\pi} \frac{1}{ 1 + \left( \beta \pi \left( v_j^t - v_\text{th} \right) \right)^2 } \,. \\
-
-
-References
-++++++++++
-
-.. [1] Neftci EO, Mostafa H, Zenke F (2019). Surrogate Gradient Learning in
-       Spiking Neural Networks. IEEE Signal Processing Magazine, 36(6), 51-63.
-       https://doi.org/10.1109/MSP.2019.2931595
-
-.. [2] Bellec G, Scherr F, Subramoney F, Hajek E, Salaj D, Legenstein R,
-       Maass W (2020). A solution to the learning dilemma for recurrent
-       networks of spiking neurons. Nature Communications, 11:3625.
-       https://doi.org/10.1038/s41467-020-17236-y
-
-.. [3] Shrestha SB, Orchard G (2018). SLAYER: Spike Layer Error Reassignment in
-       Time. Advances in Neural Information Processing Systems, 31:1412-1421.
-       https://proceedings.neurips.cc/paper_files/paper/2018/hash/82f2b308c3b01637c607ce05f52a2fed-Abstract.html
-
-.. [4] Zenke F, Ganguli S (2018). SuperSpike: Supervised Learning in Multilayer
-       Spiking Neural Networks. Neural Computation, 30:1514–1541.
-       https://doi.org/10.1162/neco_a_01086
-
-.. [5] Fang W, Yu Z, Chen Y, Huang T, Masquelier T, Tian Y (2021). Deep residual
-       learning in spiking neural networks. Advances in Neural Information
-       Processing Systems, 34:21056–21069.
-       https://proceedings.neurips.cc/paper/2021/hash/afe434653a898da20044041262b3ac74-Abstract.html
-
-EndUserDocs */
-
 /**
  * Base class implementing an intermediate archiving node model for node models supporting e-prop plasticity
  * according to Bellec et al. (2020) and supporting additional biological features described in Korcsak-Gorzo,