-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorchSharp memory issue #1278
Comments
Although, however, this is the only way it can be written. |
Perhaps this page could help: https://github.com/dotnet/TorchSharp/wiki/Memory-Management |
|
I'm kind of considering giving up, as unexpected exceptions occur when torch.NewDisposeScope is nested, especially when one function calls another and the called function also has torch.NewDisposeScope. Objects that shouldn't be disposed of are being released. It seems training AI with C# is not very realistic, which is quite frustrating. |
I‘m sorry to hear that. But Oh... Since you mentioned that 'objects that shouldn't be disposed of are being released', I guess |
It might be because using TorchSharp;
for (int i = 0; i < 10000000000; i++)
{
using (torch.NewDisposeScope())
{
var f = torch.randn(1, 3, 224, 224).@float().cuda();
using (torch.NewDisposeScope())
{
var f3 = torch.randn(1, 3, 224, 224).@float().cuda();
f[..] = f3;
}
}
}
Console.ReadKey(); By the way, are you a Chinese user? I have created a qq group (957204993) just now so we perhaps could discuss there with instant messages, which could be more convenient. |
I suppose that That's because So it's related dispose scope is: When using And after using it twice, there are no dispose scope for it. Then it leaks. (Only the tensors/parameters that is created in one dispose scope will be automatically attached to it. And in place actions will not modify its dispose scope.) |
Yes, the simulated code can resolve the issue by removing MoveToOuterDisposeScope(), but for the code where the actual memory leak occurs, I cannot handle it in this way. I need to modify the code as follows, which is very confusing to me.
It's just by adding .clone(), which is very bizarre. |
You could just remove that: public static Tensor clip_boxes(Tensor boxes, int[] shape)
{
using (torch.NewDisposeScope())
{
boxes[TensorIndex.Ellipsis, 0] = boxes[TensorIndex.Ellipsis, 0].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 1] = boxes[TensorIndex.Ellipsis, 1].clamp(0, shape[0]);
boxes[TensorIndex.Ellipsis, 2] = boxes[TensorIndex.Ellipsis, 2].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 3] = boxes[TensorIndex.Ellipsis, 3].clamp(0, shape[0]);
return boxes;
}
} I suppose there is no problem with this. Are you worried about any other things? |
No no no, your code will cause a memory leak in my project, but adding .clone() fixes it. However, the simulated code still leaks memory even with .clone() added. Please trust me, there is still an issue with torch.NewDisposeScope(). |
Actually your public static Tensor clip_boxes(Tensor boxes, int[] shape)
{
using (torch.NewDisposeScope())
{
boxes[TensorIndex.Ellipsis, 0] = boxes[TensorIndex.Ellipsis, 0].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 1] = boxes[TensorIndex.Ellipsis, 1].clamp(0, shape[0]);
boxes[TensorIndex.Ellipsis, 2] = boxes[TensorIndex.Ellipsis, 2].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 3] = boxes[TensorIndex.Ellipsis, 3].clamp(0, shape[0]);
return boxes.clone().MoveToOuterDisposeScope();
}
} Or: public static Tensor clip_boxes(Tensor boxes, int[] shape)
{
using (torch.NewDisposeScope())
{
boxes = boxes.clone();
boxes[TensorIndex.Ellipsis, 0] = boxes[TensorIndex.Ellipsis, 0].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 1] = boxes[TensorIndex.Ellipsis, 1].clamp(0, shape[0]);
boxes[TensorIndex.Ellipsis, 2] = boxes[TensorIndex.Ellipsis, 2].clamp(0, shape[1]);
boxes[TensorIndex.Ellipsis, 3] = boxes[TensorIndex.Ellipsis, 3].clamp(0, shape[0]);
return boxes.MoveToOuterDisposeScope();
}
} But I suppose there is no reason to use |
Please watch the VCR. |
Hmm I'm really not sure about that. Is it possible to share the whole project with me? |
Sorry about that, it’s not convenient at the moment. |
My only guess is that because of the higher usage of the memory (memory, not gpu memory), the garbage collection system is actived and thus the escaped tensors are released? |
Not too sure, haven't found the exact cause yet, it's a bit odd. |
This can also solve the problem of memory leaks. |
In PyTorch, when using GPU inference, GPU memory can be released at the appropriate time. In TorchSharp, when using GPU inference, there is a GPU memory leak that requires manual release.
The text was updated successfully, but these errors were encountered: