-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To what part in the equations does each part of the loss correspond? #18
Comments
I'd love to see KL-Loss implemented in pytorch. let me know when it's done. :) |
I did try to reproduce KL-Loss in yolov3 with TensorFlow, but it failed. During the training, bbox_pred_std_abs_logw_loss will be a very large negative number, resulting in a final loss=nan. |
@yihui-he, I have another question. As you basically have two different losses, one for when |xg-xe| > 1 and another one otherwise. I was wondering on what range the predications of xe live. Or maybe more precise. Are the images resized to have a height and width between 0 and 1, resulting in |xg-xe| < 1 for almost all predictions..? @EternityZY, that's unfortunate. If I get it to work in PyTorch, I'll let you know. |
@JappaB you are right. bounding boxes are resized so that height and width are 1x1. It's just for robustness, which resembles smoothL1loss. |
Thanks again for the fast response. I'll close this issue for now, but perhaps I'll comment some other questions later down the line. |
OK!Waiting for your good news! |
I'm currently training my Pytorch SSD with it. So far the loss goes down in a way I would expect. I'll let you know when I finished training if it learned something interesting. I can't share the complete code of the SSD (yet). But I'll make the KL-Loss function public this week if it really works. EDIT: never mind, I also get nans during training after some time.. I'll try to figure out why it happens. |
Alright, there was still a bug, but I'm able to train an SSD with it now and the result looks reasonable so far. Only tested it with a Pytorch SSD, but am fairly certain it should work with any detection framework. Don't have results comparing it with the normal loss function. I am still doing some hyperparameter tuning (learning rate, learning rate schedule etc.). @yihui-he, how would you like to do this:
I don't have a preference. To be clear, for now, it will only be a single file with the KL-Loss part implemented in Pytorch. Later I can look at sharing the SSD as well. |
@JappaB I guess the second way is better, since this repo is based on caffe2 |
@yihui-he, The SSD with the KL-Loss performs (quite a bit) worse than the SSD with the normal loss. Do you have an idea whether the person whom improved YOLO with the KL-LOSS, did it with YOLOv3 or YOLOv2/YOLO9000 or the original YOLO?
If it is not that, then I still might be doing something wrong in the implementation... @EternityZY, If I'm more certain that I didn't screw up, I'll release the code.. |
@JappaB YOLO-Lite (mAP 79.4%) https://github.com/Stinky-Tofu/Stronger-yolo He told me mAP70 mAP75 mAP90 are improved 4%, 8%, 8% respectively on VOC2007 test, though mAP50 1% drops.
|
@yihui-he,
|
@JappaB no way to debug this without looking at the code. |
@yihui-he , With pixel level, do you mean that this transformation takes place before any resizing to a fixed input size and before the image and the bounding boxes are scaled to be between 0 and 1? I do use a transformation that is a bit different, without the -1. But as Ros Girshick mentions in the comment at the top of boxes.py: "in practice, as long as a model is trained and tested with a consistent convention either decision seems to be ok (at least in our experience on COCO)" Btw, thanks for all your replies and thinking along.
|
@JappaB ok, maybe there're some other issues we don't know yet |
@yihui-he, If I take the time to break the SSD with KL-Loss out of my current repo and make a new repo with it (including a training, and an evaluation script) and share it with you. Do you think you have time to go through it? |
@JappaB sure. point to the critical parts where u make changes |
@JappaB @EternityZY FYI, YOLO with KL-loss is released https://github.com/wlguan/Stronger-yolo-pytorch |
|
@JappaB could you please share the KL LOSS with SSD if you have completed the implementation. |
Hi @yihui-he,
I found this issue: #7, where it is mentioned that there are 3 parts to the KL-Loss:
The normal bbox regression loss:
loss_bbox
(basically the mean of the bbox coordinate prediction)bbox_pred_std_abs_logw_loss
bbox_pred_std_abs_mulw_loss
I have a couple of questions. Firstly, to what part of what formula in the paper does each of the above correspond.
Similarly, what do
bbox_inside_weights
andbbox_outside_weights
and 'val
' (in comments e.g. line 120) correspond to?Secondly, I wondered how you backpropagate the gradients from the Loss function, as you use the 'StopGradient' function. Do you backpropagate the gradient from all three components trough the whole network? Or only the normal bbox regression Loss part?
I've never used caffe2 before, so it has taken quite a bit of work to get a feel for the code. As I am trying to implement your work in a (PyTorch) SSD, I want to be sure I do the correct things.
@EternityZY,
I saw you attempted to implement the KL-Loss in YOLOv3. Did you succeed?
As I'm trying to implement the KL-Loss in SSD (a Pytorch version), your YOLOv3 implementation might have some overlap/give some intuition. Would you be willing to you share your code?
The text was updated successfully, but these errors were encountered: