How does llama.cpp manage the memory? #6324
Replies: 3 comments 14 replies
-
Hi, the inputs and temporary tensors used in the computation graph are allocated by |
Beta Was this translation helpful? Give feedback.
-
Hi slaren, Thank you. |
Beta Was this translation helpful? Give feedback.
-
@slaren Hi slaren, I wonder if the "execute" of this graph is an asynchronous process. If one part of the node is in the CPU and the other part is in the GPU, is there a possibility of parallelism? Is there any issue or discussion about this? |
Beta Was this translation helpful? Give feedback.
-
Hello, I'm wondering how does llama.cpp manage the memory:
I hope I described my confusions properly and thanks for your attention.
Beta Was this translation helpful? Give feedback.
All reactions