Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss increases gradually #225

Open
binary-person opened this issue Mar 19, 2019 · 1 comment
Open

Loss increases gradually #225

binary-person opened this issue Mar 19, 2019 · 1 comment

Comments

@binary-person
Copy link

binary-person commented Mar 19, 2019

As I go into the 70000 iteration, the loss and val_loss seem to increase then decrease training on a 2.3G dataset running the command:

th -input_h5 data/all.h5 -input_json data/all.json -model_type lstm -num_layers 3 -rnn_size 512 -batch_size 100 -seq_length 100 -print_every 1 -checkpoint_every 10000 -reset_iterations 0 -max_epochs 10

Here's a log.txt file of the entire training:
log-file.txt

@antihutka
Copy link
Contributor

There are multiple things that can cause this, in my experience. Your learning rate might be too high, or you might have to decrease faster for large datasets. For example, this is the script I use for my 230MB dataset (using my torch-rnn fork with extra features):

BASECMD='th train.lua -input_h5 data/combined-latest.h5 -input_json data/combined-latest.json -gpu 0 -gpu_opt -2 -low_mem_dropout 0 -dropout 0.10 -shuffle_data 1 -zoneout 0.01'
MODEL='-seq_offset 1 -model_type gridgru -wordvec_size 1024 -rnn_size 2048 -num_layers 4'
CPNAME='cv/combined-20190212'
CPINT=10214
CMD="$BASECMD $MODEL -checkpoint_every $CPINT"
export CUDA_VISIBLE_DEVICES=1,2
mkdir -p $CPNAME

$CMD -batch_size 128 -seq_length 256  -max_epochs 6  -learning_rate 4e-4   -lr_decay_every 2 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/a -print_every 250
$CMD -batch_size 64  -seq_length 512  -max_epochs 10 -learning_rate 5e-5   -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/b -print_every 250 -init_from $CPNAME/a_$(($CPINT*6)).t7  -reset_iterations 0
$CMD -batch_size 32  -seq_length 1024 -max_epochs 14 -learning_rate 2.5e-5 -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/c -print_every 250 -init_from $CPNAME/b_$(($CPINT*10)).t7  -reset_iterations 0
$CMD -batch_size 16  -seq_length 2048 -max_epochs 16 -learning_rate 1.2e-5 -lr_decay_every 1 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/d -print_every 250 -init_from $CPNAME/c_$(($CPINT*14)).t7 -reset_iterations 0

It also could be caused by not randomly ordering the sequences during training, this is also implemented in my fork and it made my model train more smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants