Skip to content
This repository has been archived by the owner on Dec 23, 2022. It is now read-only.

loss didn't decrease #5

Open
bamps53 opened this issue Feb 6, 2021 · 2 comments
Open

loss didn't decrease #5

bamps53 opened this issue Feb 6, 2021 · 2 comments
Assignees

Comments

@bamps53
Copy link

bamps53 commented Feb 6, 2021

Describe the bug
After fixing lr, I ran the DETR training but it seems loss didn't decrease at all.
I know DETR convergence is so slow, but is this loss behavior natural?

To Reproduce
run this notebook
https://colab.research.google.com/github/Emgarr/kerod/blob/master/notebooks/detr_coco_training_multi_gpu.ipynb

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Epoch 1/300
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
  34458/Unknown - 16564s 479ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
WARNING:tensorflow:Using a while_loop for converting EagerPyFunc
34458/34458 [==============================] - 16938s 490ms/step - loss: 31.4098 - giou_last_layer: 1.7223 - l1_last_layer: 1.3152 - scc_last_layer: 2.1834 - sparse_categorical_accuracy: 0.5316 - object_recall: 5.6169e-04 - val_loss: 31.0949 - val_giou_last_layer: 1.7153 - val_l1_last_layer: 1.2824 - val_scc_last_layer: 2.1859 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 2/300
34458/34458 [==============================] - 16649s 483ms/step - loss: 31.5205 - giou_last_layer: 1.7350 - l1_last_layer: 1.3259 - scc_last_layer: 2.1788 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5231 - val_giou_last_layer: 1.7450 - val_l1_last_layer: 1.2715 - val_scc_last_layer: 2.2023 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 3/300
34458/34458 [==============================] - 15912s 462ms/step - loss: 31.5544 - giou_last_layer: 1.7355 - l1_last_layer: 1.3301 - scc_last_layer: 2.1814 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.5587 - val_giou_last_layer: 1.7398 - val_l1_last_layer: 1.2982 - val_scc_last_layer: 2.1964 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 4/300
34458/34458 [==============================] - 15974s 463ms/step - loss: 31.5491 - giou_last_layer: 1.7391 - l1_last_layer: 1.3330 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.4192 - val_giou_last_layer: 1.7525 - val_l1_last_layer: 1.3120 - val_scc_last_layer: 2.1949 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 5/300
34458/34458 [==============================] - 16581s 481ms/step - loss: 31.4819 - giou_last_layer: 1.7322 - l1_last_layer: 1.3308 - scc_last_layer: 2.1796 - sparse_categorical_accuracy: 0.5319 - object_recall: 0.0000e+00 - val_loss: 31.6360 - val_giou_last_layer: 1.7783 - val_l1_last_layer: 1.3163 - val_scc_last_layer: 2.1977 - val_sparse_categorical_accuracy: 0.5282 - val_object_recall: 0.0000e+00
Epoch 6/300
 1580/34458 [>.............................] - ETA: 4:21:11 - loss: 31.5871 - giou_last_layer: 1.7425 - l1_last_layer: 1.3287 - scc_last_layer: 2.1863 - sparse_categorical_accuracy: 0.5323 - object_recall: 0.0000e+00

Desktop (please complete the following information):
colab notebook

Additional context
Add any other context about the problem here.

@EmGarr
Copy link
Owner

EmGarr commented Feb 6, 2021

I cannot tell you since I have no GPUs at my disposal. What I can tell you is that the loss is decreasing during an overfit on a single image. You can try the detr overfit notebook.
If I had GPUs I would correct the possible bugs but currently I cannot :s. I runned the overfit with the same inputs on the official codebase and kerod and I ended up with the same results.

@EmGarr
Copy link
Owner

EmGarr commented Feb 6, 2021

@bamps53 You'll find here a gist of the notebook I used to test my detr implementation:
https://colab.research.google.com/gist/EmGarr/5b7d576a9b3f12683290c99644070f7b/inspect-detr_facebook-similarity-vs-kerod.ipynb

It is mostly code copy pasted from https://github.com/facebookresearch/detr and calls from kerod library.

What is tested inside of it:

  • For same inputs same similarity matrices
  • For same inputs same loss (I just modified a bit the loss of Detr to instead having the background classes at pos 80 + 1 have it at pos 0 like in kerod)
  • Small overfit on a single batch

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants