Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
jloveric committed May 7, 2024
1 parent 8292956 commit b3596d8
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 7 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,10 @@ super high accuracy. Still need to create a text generator for evaluation with t
python examples/high_order_interpolation.py data.type=sequence net=large_standard net.hidden.width=1000 max_epochs=100 optimizer.lr=1e-4 net.model_type=low_order_mlp
```
## Dual Convolution
The idea is to repeately apply the same high order 1d convolution to reduce the input sequence to a single remaining vector. The update is dynamic and the number of times the convolution is applied depends on the length of the sequence. This was inspired by mamba, but is really nothing like mamba, more like a variable depth high order convnet. The command to run is
The idea is to repeately apply the same high order 1d convolution to reduce the input sequence to a single remaining vector. The update is dynamic and the number of times the convolution is applied depends on the length of the sequence. Remarkable this actually sort of works but is insanely slow. Maybe I can get faster convergence somehow.
```
python examples/high_order_interpolation.py data.type=sequence net=dual_convolution max_epochs=100 optimizer.lr=1e-4 batch_size=32 net.layer_type=continuous data.repeats=5 net.n=2 data.max_features=10 optimizer.patience=20 initialize.type=linear
python examples/high_order_interpolation.py data.type=sequence net=dual_convolution max_epochs=100 optimizer.lr=1e-5 batch_size=32 net.layer_type=discontinuous data.repeats=1 net.n=3 net.segments=4 data.max_features=10 optimizer.patience=20 net.embedding_dimension=128 net.hidden_width=1024 net.normalize=maxabs initialize.type=linear
```
Not surprising, this technique does not seem to work particularly well. So far it's been unable to get beyond generating more than a few letters.

## Notes
I use input layer (continuous or discontinuous) with 128 segments, one for each ASCII character. You can bump this down to 64, but the convergence doesn't seem quite as good - presumably it still works because most books don't use all the ascii characters anyway.
Expand Down
3 changes: 1 addition & 2 deletions config/net/dual_convolution.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,11 @@ model_type: dual_convolution
n: 3

segments: 2
#base_width: 8

in_width: 1
out_width: 128
embedding_dimension: 1024
hidden_width: 128
hidden_width: 1024
hidden_layers: 2
in_segments: 128
accelerator: cuda
4 changes: 2 additions & 2 deletions language_interpolation/dual_convolutional_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def __init__(
self.device = device
self.interior_normalization = normalization()
self.input_layer = HighOrderMLP(
layer_type="continuous",
layer_type="discontinuous",
n=n,
in_width=in_width,
in_segments=in_segments,
Expand All @@ -40,7 +40,7 @@ def __init__(
normalization=normalization
)
self.equal_layers = HighOrderMLP(
layer_type="continuous",
layer_type="discontinuous",
n=n,
in_width=2 * out_width,
out_width=out_width,
Expand Down

0 comments on commit b3596d8

Please sign in to comment.