Merge pull request #379 from rcurtin/scd-to-cd

Rename `SCD` to `CD`
mlpack · Sep 29, 2023 · 42cdf42 · 42cdf42
2 parents b97c8b0 + 5bc145b
commit 42cdf42
Show file tree

Hide file tree

Showing 11 changed files with 144 additions and 120 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -9,6 +9,9 @@
  * Fix CNE test tolerances
    ([#360](https://github.com/mlpack/ensmallen/pull/360)).
 
+ * Rename `SCD` optimizer, to `CD`
+   ([#379](https://github.com/mlpack/ensmallen/pull/379)).
+
 ### ensmallen 2.19.1: "Eight Ball Deluxe"
 ###### 2023-01-30
  * Avoid deprecation warnings in Armadillo 11.2+

diff --git a/doc/function_types.md b/doc/function_types.md
@@ -307,7 +307,7 @@ regular implementation of the `Gradient()`, so that function may be omitted.
 If these functions are implemented, the following partially differentiable
 function optimizers can be used:
 
- - [Stochastic Coordinate Descent](#stochastic-coordinate-descent-scd)
+ - [Coordinate Descent](#coordinate-descent-cd)
 
 ## Arbitrary separable functions
 

diff --git a/doc/optimizers.md b/doc/optimizers.md
@@ -778,6 +778,82 @@ optimizer2.Optimize(f, coordinates);
  * [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
  * [SGD](#standard-sgd)
 
+## Coordinate Descent (CD)
+
+*An optimizer for [partially differentiable functions](#partially-differentiable-functions).*
+
+Coordinate descent is a technique for minimizing a function by doing a line
+search along a single direction at the current point in the iteration. The
+direction (or "coordinate") can be chosen cyclically, randomly or in a greedy
+fashion.
+
+#### Constructors
+
+ * `CD<`_`DescentPolicyType`_`>()`
+ * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)`
+ * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)`
+ * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)`
+
+The _`DescentPolicyType`_ template parameter specifies the behavior of CD when
+selecting the next coordinate to descend with.  The `RandomDescent`,
+`GreedyDescent`, and `CyclicDescent` classes are available for use.  Custom
+behavior can be achieved by implementing a class with the same method
+signatures.
+
+For convenience, the following typedefs have been defined:
+
+ * `RandomCD` (equivalent to `CD<RandomDescent>`): selects coordinates randomly
+ * `GreedyCD` (equivalent to `CD<GreedyDescent>`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule
+ * `CyclicCD` (equivalent to `CD<CyclicDescent>`): selects coordinates sequentially
+
+***Note***: `CD` used to be called `SCD`.  Use of the name `SCD` is deprecated,
+and will be removed in ensmallen 3 and later.
+
+#### Attributes
+
+| **type** | **name** | **description** | **default** |
+|----------|----------|-----------------|-------------|
+| `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
+| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
+| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` |
+| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` |
+| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` |
+
+Attributes of the optimizer may also be modified via the member methods
+`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and
+`DescentPolicy()`.
+
+Note that the default value for `descentPolicy` is the default constructor for
+_`DescentPolicyType`_.
+
+#### Examples
+
+<details open>
+<summary>Click to collapse/expand example code.
+</summary>
+
+```c++
+SparseTestFunction f;
+arma::mat coordinates = f.GetInitialPoint();
+
+RandomCD randomscd(0.01, 100000, 1e-5, 1e3);
+randomscd.Optimize(f, coordinates);
+
+GreedyCD greedyscd(0.01, 100000, 1e-5, 1e3);
+greedyscd.Optimize(f, coordinates);
+
+CyclicCD cyclicscd(0.01, 100000, 1e-5, 1e3);
+cyclicscd.Optimize(f, coordinates);
+```
+
+</details>
+
+#### See also:
+
+ * [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent)
+ * [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf)
+ * [Partially differentiable functions](#partially-differentiable-functions)
+
 ## CMAES
 
 *An optimizer for [separable functions](#separable-functions).*
@@ -2807,79 +2883,6 @@ optimizer.Optimize(f, coordinates);
  * [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
  * [Differentiable separable functions](#differentiable-separable-functions)
 
-## Stochastic Coordinate Descent (SCD)
-
-*An optimizer for [partially differentiable functions](#partially-differentiable-functions).*
-
-Stochastic Coordinate descent is a technique for minimizing a function by
-doing a line search along a single direction at the current point in the
-iteration. The direction (or "coordinate") can be chosen cyclically, randomly
-or in a greedy fashion.
-
-#### Constructors
-
- * `SCD<`_`DescentPolicyType`_`>()`
- * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)`
- * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)`
- * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)`
-
-The _`DescentPolicyType`_ template parameter specifies the behavior of SCD when
-selecting the next coordinate to descend with.  The `RandomDescent`,
-`GreedyDescent`, and `CyclicDescent` classes are available for use.  Custom
-behavior can be achieved by implementing a class with the same method
-signatures.
-
-For convenience, the following typedefs have been defined:
-
- * `RandomSCD` (equivalent to `SCD<RandomDescent>`): selects coordinates randomly
- * `GreedySCD` (equivalent to `SCD<GreedyDescent>`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule
- * `CyclicSCD` (equivalent to `SCD<CyclicDescent>`): selects coordinates sequentially
-
-#### Attributes
-
-| **type** | **name** | **description** | **default** |
-|----------|----------|-----------------|-------------|
-| `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
-| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
-| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` |
-| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` |
-| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` |
-
-Attributes of the optimizer may also be modified via the member methods
-`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and
-`DescentPolicy()`.
-
-Note that the default value for `descentPolicy` is the default constructor for
-_`DescentPolicyType`_.
-
-#### Examples
-
-<details open>
-<summary>Click to collapse/expand example code.
-</summary>
-
-```c++
-SparseTestFunction f;
-arma::mat coordinates = f.GetInitialPoint();
-
-RandomSCD randomscd(0.01, 100000, 1e-5, 1e3);
-randomscd.Optimize(f, coordinates);
-
-GreedySCD greedyscd(0.01, 100000, 1e-5, 1e3);
-greedyscd.Optimize(f, coordinates);
-
-CyclicSCD cyclicscd(0.01, 100000, 1e-5, 1e3);
-cyclicscd.Optimize(f, coordinates);
-```
-
-</details>
-
-#### See also:
-
- * [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent)
- * [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf)
- * [Partially differentiable functions](#partially-differentiable-functions)
-
 ## Stochastic Gradient Descent with Restarts (SGDR)
 
 *An optimizer for [differentiable separable

diff --git a/include/ensmallen.hpp b/include/ensmallen.hpp
@@ -98,6 +98,7 @@
 #include "ensmallen_bits/bigbatch_sgd/bigbatch_sgd.hpp"
 #include "ensmallen_bits/cmaes/cmaes.hpp"
 #include "ensmallen_bits/cmaes/active_cmaes.hpp"
+#include "ensmallen_bits/cd/cd.hpp"
 #include "ensmallen_bits/cne/cne.hpp"
 #include "ensmallen_bits/de/de.hpp"
 #include "ensmallen_bits/eve/eve.hpp"
@@ -119,7 +120,6 @@
 
 #include "ensmallen_bits/sa/sa.hpp"
 #include "ensmallen_bits/sarah/sarah.hpp"
-#include "ensmallen_bits/scd/scd.hpp"
 #include "ensmallen_bits/sdp/sdp.hpp"
 #include "ensmallen_bits/sdp/lrsdp.hpp"
 #include "ensmallen_bits/sdp/primal_dual.hpp"

diff --git a/include/ensmallen_bits/scd/scd.hpp → include/ensmallen_bits/cd/cd.hpp b/include/ensmallen_bits/scd/scd.hpp → include/ensmallen_bits/cd/cd.hpp
@@ -1,16 +1,16 @@
 /**
- * @file scd.hpp
+ * @file cd.hpp
  * @author Shikhar Bhardwaj
  *
- * Stochastic Coordinate Descent (SCD).
+ * Coordinate Descent (CD).
  *
  * ensmallen is free software; you may redistribute it and/or modify it under
  * the terms of the 3-clause BSD license.  You should have received a copy of
  * the 3-clause BSD license along with ensmallen.  If not, see
  * http://www.opensource.org/licenses/BSD-3-Clause for more information.
  */
-#ifndef ENSMALLEN_SCD_SCD_HPP
-#define ENSMALLEN_SCD_SCD_HPP
+#ifndef ENSMALLEN_CD_CD_HPP
+#define ENSMALLEN_CD_CD_HPP
 
 #include "descent_policies/cyclic_descent.hpp"
 #include "descent_policies/random_descent.hpp"
@@ -42,19 +42,19 @@ namespace ens {
  * }
  * @endcode
  *
- * SCD can optimize partially differentiable functions.  For more details, see
+ * CD can optimize partially differentiable functions.  For more details, see
  * the documentation on function types included with this distribution or on the
  * ensmallen website.
  *
  * @tparam DescentPolicy Descent policy to decide the order in which the
  *     coordinate for descent is selected.
  */
 template <typename DescentPolicyType = RandomDescent>
-class SCD
+class CD
 {
  public:
   /**
-   * Construct the SCD optimizer with the given function and parameters. The
+   * Construct the CD optimizer with the given function and parameters. The
    * default value here are not necessarily good for every problem, so it is
    * suggested that the values used are tailored for the task at hand. The
    * maximum number of iterations refers to the maximum number of "descents"
@@ -70,11 +70,11 @@ class SCD
    * @param descentPolicy The policy to use for picking up the coordinate to
    *    descend on.
    */
-  SCD(const double stepSize = 0.01,
-      const size_t maxIterations = 100000,
-      const double tolerance = 1e-5,
-      const size_t updateInterval = 1e3,
-      const DescentPolicyType descentPolicy = DescentPolicyType());
+  CD(const double stepSize = 0.01,
+     const size_t maxIterations = 100000,
+     const double tolerance = 1e-5,
+     const size_t updateInterval = 1e3,
+     const DescentPolicyType descentPolicy = DescentPolicyType());
 
   /**
    * Optimize the given function using stochastic coordinate descent. The
@@ -158,6 +158,24 @@ class SCD
 } // namespace ens
 
 // Include implementation.
-#include "scd_impl.hpp"
+#include "cd_impl.hpp"
+
+namespace ens {
+
+/**
+ * Backwards-compatibility alias; this can be removed after ensmallen 3.10.0.
+ * The history here is that CD was originally named SCD, but that is an
+ * inaccurate name because this is not a stochastic technique; thus, it was
+ * renamed SCD.
+ */
+template<typename DescentPolicyType = RandomDescent>
+using SCD = CD<DescentPolicyType>;
+
+// Convenience typedefs.
+using RandomCD = CD<RandomDescent>;
+using GreedyCD = CD<GreedyDescent>;
+using CyclicCD = CD<CyclicDescent>;
+
+} // namespace ens
 
 #endif
diff --git a/include/ensmallen_bits/scd/scd_impl.hpp → include/ensmallen_bits/cd/cd_impl.hpp b/include/ensmallen_bits/scd/scd_impl.hpp → include/ensmallen_bits/cd/cd_impl.hpp
@@ -1,26 +1,26 @@
 /**
- * @file scd_impl.hpp
+ * @file cd_impl.hpp
  * @author Shikhar Bhardwaj
  *
- * Implementation of stochastic coordinate descent.
+ * Implementation of coordinate descent.
  *
  * ensmallen is free software; you may redistribute it and/or modify it under
  * the terms of the 3-clause BSD license.  You should have received a copy of
  * the 3-clause BSD license along with ensmallen.  If not, see
  * http://www.opensource.org/licenses/BSD-3-Clause for more information.
  */
-#ifndef ENSMALLEN_SCD_SCD_IMPL_HPP
-#define ENSMALLEN_SCD_SCD_IMPL_HPP
+#ifndef ENSMALLEN_CD_CD_IMPL_HPP
+#define ENSMALLEN_CD_CD_IMPL_HPP
 
 // In case it hasn't been included yet.
-#include "scd.hpp"
+#include "cd.hpp"
 
 #include <ensmallen_bits/function.hpp>
 
 namespace ens {
 
 template <typename DescentPolicyType>
-SCD<DescentPolicyType>::SCD(
+CD<DescentPolicyType>::CD(
     const double stepSize,
     const size_t maxIterations,
     const double tolerance,
@@ -41,7 +41,7 @@ template <typename ResolvableFunctionType,
           typename... CallbackTypes>
 typename std::enable_if<IsArmaType<GradType>::value,
 typename MatType::elem_type>::type
-SCD<DescentPolicyType>::Optimize(
+CD<DescentPolicyType>::Optimize(
     ResolvableFunctionType& function,
     MatType& iterateIn,
     CallbackTypes&&... callbacks)
@@ -94,12 +94,12 @@ SCD<DescentPolicyType>::Optimize(
           overallObjective, callbacks...);
 
       // Output current objective function.
-      Info << "SCD: iteration " << i << ", objective " << overallObjective
+      Info << "CD: iteration " << i << ", objective " << overallObjective
           << "." << std::endl;
 
       if (std::isnan(overallObjective) || std::isinf(overallObjective))
       {
-        Warn << "SCD: converged to " << overallObjective << "; terminating"
+        Warn << "CD: converged to " << overallObjective << "; terminating"
             << " with failure.  Try a smaller step size?" << std::endl;
 
         Callback::EndOptimization(*this, function, iterate, callbacks...);
@@ -108,7 +108,7 @@ SCD<DescentPolicyType>::Optimize(
 
       if (std::abs(lastObjective - overallObjective) < tolerance)
       {
-        Info << "SCD: minimized within tolerance " << tolerance << "; "
+        Info << "CD: minimized within tolerance " << tolerance << "; "
             << "terminating optimization." << std::endl;
 
         Callback::EndOptimization(*this, function, iterate, callbacks...);
@@ -119,7 +119,7 @@ SCD<DescentPolicyType>::Optimize(
     }
   }
 
-  Info << "SCD: maximum iterations (" << maxIterations << ") reached; "
+  Info << "CD: maximum iterations (" << maxIterations << ") reached; "
       << "terminating optimization." << std::endl;
 
   // Calculate and return final objective.

diff --git a/...s/scd/descent_policies/cyclic_descent.hpp → ...ts/cd/descent_policies/cyclic_descent.hpp b/...s/scd/descent_policies/cyclic_descent.hpp → ...ts/cd/descent_policies/cyclic_descent.hpp
diff --git a/...s/scd/descent_policies/greedy_descent.hpp → ...ts/cd/descent_policies/greedy_descent.hpp b/...s/scd/descent_policies/greedy_descent.hpp → ...ts/cd/descent_policies/greedy_descent.hpp
diff --git a/...s/scd/descent_policies/random_descent.hpp → ...ts/cd/descent_policies/random_descent.hpp b/...s/scd/descent_policies/random_descent.hpp → ...ts/cd/descent_policies/random_descent.hpp
diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
@@ -11,6 +11,7 @@ set(ENSMALLEN_TESTS_SOURCES
     aug_lagrangian_test.cpp
     bigbatch_sgd_test.cpp
     callbacks_test.cpp
+    cd_test.cpp
     cmaes_test.cpp
     cne_test.cpp
     de_test.cpp
@@ -39,7 +40,6 @@ set(ENSMALLEN_TESTS_SOURCES
     rmsprop_test.cpp
     sa_test.cpp
     sarah_test.cpp
-    scd_test.cpp
     sdp_primal_dual_test.cpp
     sgdr_test.cpp
     sgd_test.cpp