Skip to content

Latest commit

 

History

History
47 lines (37 loc) · 2.09 KB

README.md

File metadata and controls

47 lines (37 loc) · 2.09 KB

MadamOpt.jl

mit_badge docs_badge

Summary

This is a testing ground for some extensions related to Adam (Adaptive Moment Estimation) using Julia. MadamOpt.jl was born out of a need for gradient-free online optimization.

Note that while this library could be used to train deep models, that is not its chief design goal. Nevertheless, an example of using MadamOpt with FluxML is included in the examples directory (the library supports GPU acceleration / CUDA when a gradient is provided).

Features

The extensions currently implemented by the library are:

  • L1 regularization via ISTA (Iterative Shrinkage-Thresholding Algorithm).
  • Gradient-free optimization via a discrete approximation of the gradient using a subset of model parameters at each iteration (suitable for small to medium-sized models).
  • A technique loosely based on simulated annealing for estimating non-convex functions without using a gradient.

In the standard Adam, the scaling of the gradient prevents the tresholding from affecting only relatively insignificant features (i.e. dividing the mean gradient by square root of the uncentered variance results in a term that multiplies Adam's alpha term by a value between -1.0 and 1.0, modulo differences in their decay rates). Therefore, the step size is further scaled by log(1+abs(gradient)).

See the unit test for examples on fitting a 100-dimensional non-convex Ackley function, a sparse 500x250 matrix, and the Rosenbrock function.

For an API overview, see the docs, unit tests, and examples.