Loss increases gradually #225

binary-person · 2019-03-19T02:06:04Z

As I go into the 70000 iteration, the loss and val_loss seem to increase then decrease training on a 2.3G dataset running the command:

th -input_h5 data/all.h5 -input_json data/all.json -model_type lstm -num_layers 3 -rnn_size 512 -batch_size 100 -seq_length 100 -print_every 1 -checkpoint_every 10000 -reset_iterations 0 -max_epochs 10

Here's a log.txt file of the entire training:
log-file.txt

The text was updated successfully, but these errors were encountered:

antihutka · 2019-03-19T04:08:50Z

There are multiple things that can cause this, in my experience. Your learning rate might be too high, or you might have to decrease faster for large datasets. For example, this is the script I use for my 230MB dataset (using my torch-rnn fork with extra features):

BASECMD='th train.lua -input_h5 data/combined-latest.h5 -input_json data/combined-latest.json -gpu 0 -gpu_opt -2 -low_mem_dropout 0 -dropout 0.10 -shuffle_data 1 -zoneout 0.01'
MODEL='-seq_offset 1 -model_type gridgru -wordvec_size 1024 -rnn_size 2048 -num_layers 4'
CPNAME='cv/combined-20190212'
CPINT=10214
CMD="$BASECMD $MODEL -checkpoint_every $CPINT"
export CUDA_VISIBLE_DEVICES=1,2
mkdir -p $CPNAME

$CMD -batch_size 128 -seq_length 256  -max_epochs 6  -learning_rate 4e-4   -lr_decay_every 2 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/a -print_every 250
$CMD -batch_size 64  -seq_length 512  -max_epochs 10 -learning_rate 5e-5   -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/b -print_every 250 -init_from $CPNAME/a_$(($CPINT*6)).t7  -reset_iterations 0
$CMD -batch_size 32  -seq_length 1024 -max_epochs 14 -learning_rate 2.5e-5 -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/c -print_every 250 -init_from $CPNAME/b_$(($CPINT*10)).t7  -reset_iterations 0
$CMD -batch_size 16  -seq_length 2048 -max_epochs 16 -learning_rate 1.2e-5 -lr_decay_every 1 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/d -print_every 250 -init_from $CPNAME/c_$(($CPINT*14)).t7 -reset_iterations 0

It also could be caused by not randomly ordering the sequences during training, this is also implemented in my fork and it made my model train more smoothly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss increases gradually #225

Loss increases gradually #225

binary-person commented Mar 19, 2019 •

edited

Loading

antihutka commented Mar 19, 2019

Loss increases gradually #225

Loss increases gradually #225

Comments

binary-person commented Mar 19, 2019 • edited Loading

antihutka commented Mar 19, 2019

binary-person commented Mar 19, 2019 •

edited

Loading