Update readme

jloveric · May 7, 2024 · b3596d8 · b3596d8
1 parent 8292956
commit b3596d8
Show file tree

Hide file tree

Showing 3 changed files with 5 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -127,11 +127,10 @@ super high accuracy. Still need to create a text generator for evaluation with t
 python examples/high_order_interpolation.py data.type=sequence net=large_standard net.hidden.width=1000 max_epochs=100 optimizer.lr=1e-4 net.model_type=low_order_mlp
 ```
 ## Dual Convolution
-The idea is to repeately apply the same high order 1d convolution to reduce the input sequence to a single remaining vector. The update is dynamic and the number of times the convolution is applied depends on the length of the sequence. This was inspired by mamba, but is really nothing like mamba, more like a variable depth high order convnet. The command to run is
+The idea is to repeately apply the same high order 1d convolution to reduce the input sequence to a single remaining vector. The update is dynamic and the number of times the convolution is applied depends on the length of the sequence. Remarkable this actually sort of works but is insanely slow. Maybe I can get faster convergence somehow.
 ```
-python examples/high_order_interpolation.py data.type=sequence net=dual_convolution max_epochs=100 optimizer.lr=1e-4 batch_size=32 net.layer_type=continuous data.repeats=5 net.n=2 data.max_features=10 optimizer.patience=20 initialize.type=linear
+python examples/high_order_interpolation.py data.type=sequence net=dual_convolution max_epochs=100 optimizer.lr=1e-5 batch_size=32 net.layer_type=discontinuous data.repeats=1 net.n=3 net.segments=4 data.max_features=10 optimizer.patience=20  net.embedding_dimension=128 net.hidden_width=1024 net.normalize=maxabs initialize.type=linear
 ```
-Not surprising, this technique does not seem to work particularly well. So far it's been unable to get beyond generating more than a few letters.
 
 ## Notes
 I use input layer (continuous or discontinuous) with 128 segments, one for each ASCII character.  You can bump this down to 64, but the convergence doesn't seem quite as good - presumably it still works because most books don't use all the ascii characters anyway.

diff --git a/config/net/dual_convolution.yaml b/config/net/dual_convolution.yaml
@@ -5,12 +5,11 @@ model_type: dual_convolution
 n: 3
 
 segments: 2
-#base_width: 8
 
 in_width: 1
 out_width: 128
 embedding_dimension: 1024
-hidden_width: 128
+hidden_width: 1024
 hidden_layers: 2
 in_segments: 128
 accelerator: cuda
diff --git a/language_interpolation/dual_convolutional_network.py b/language_interpolation/dual_convolutional_network.py
@@ -27,7 +27,7 @@ def __init__(
         self.device = device
         self.interior_normalization = normalization()
         self.input_layer = HighOrderMLP(
-            layer_type="continuous",
+            layer_type="discontinuous",
             n=n,
             in_width=in_width,
             in_segments=in_segments,
@@ -40,7 +40,7 @@ def __init__(
             normalization=normalization
         )
         self.equal_layers = HighOrderMLP(
-            layer_type="continuous",
+            layer_type="discontinuous",
             n=n,
             in_width=2 * out_width,
             out_width=out_width,