From 2dfbcec7d9c350fb53a31d8b60d98a2d3b85d0dc Mon Sep 17 00:00:00 2001 From: jloveric Date: Sat, 30 Dec 2023 09:25:10 -0800 Subject: [PATCH] Update readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2374a29..e1bf3a0 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ these type of networks that have potentially steep gradients due to the polynomi kaiming initialization seems to be performing better than linear initialization, but I need to investigate this further. ### sparse mlp -A few networks which are large enough to memorize "The Dunwich Horror" which is fairly short (120KB). Using Adam + learning rate scheduler. +A few networks which are large enough to memorize "The Dunwich Horror" which is fairly short (120KB). Using Lion optimizer + learning rate scheduler. #### Piecewise constant Piecewise constant (requires discontinuous). Only the first layer can actually be optimized since derivatives beyond that are zero