From 2dfbcec7d9c350fb53a31d8b60d98a2d3b85d0dc Mon Sep 17 00:00:00 2001
From: jloveric <john.loverich@gmail.com>
Date: Sat, 30 Dec 2023 09:25:10 -0800
Subject: [PATCH] Update readme

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2374a29..e1bf3a0 100644
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ these type of networks that have potentially steep gradients due to the polynomi
 kaiming initialization seems to be performing better than linear initialization, but I need to investigate this further.
 
 ### sparse mlp
-A few networks which are large enough to memorize "The Dunwich Horror" which is fairly short (120KB). Using Adam + learning rate scheduler.
+A few networks which are large enough to memorize "The Dunwich Horror" which is fairly short (120KB). Using Lion optimizer + learning rate scheduler.
 
 #### Piecewise constant
 Piecewise constant (requires discontinuous). Only the first layer can actually be optimized since derivatives beyond that are zero