Question about CUDA memory requirements to run code #7

frostfox661 · 2024-10-30T01:26:36Z

When I run the file "folio-direct-llm.py" using single 4090 and llama-7b, I often get CUDA memory overflow issues. After adding the code to clear the CUDA cache for each case loop, and monitoring the storage situation, it was found that the CUDA memory will increase cumulatively at the specific node. What is going on? What is the equipment environment required to run this project?

yifanzhang-pro · 2024-10-30T01:35:21Z

The guidance library might have some caching mechanism for multiple queries of the same context, we suggest you to run it on A100-80GB.

frostfox661 · 2024-10-30T07:36:27Z

Thank you for your reply.
What are the special advantages of using the guidance library in this project? Can other libraries be used as an alternative?

yifanzhang-pro · 2024-10-30T18:42:38Z

You may try using alternative libraries, though the prompt may need adjustment for compatibility with different models and libraries, and the results may vary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about CUDA memory requirements to run code #7

Question about CUDA memory requirements to run code #7

frostfox661 commented Oct 30, 2024

yifanzhang-pro commented Oct 30, 2024 •

edited

Loading

frostfox661 commented Oct 30, 2024

yifanzhang-pro commented Oct 30, 2024

Question about CUDA memory requirements to run code #7

Question about CUDA memory requirements to run code #7

Comments

frostfox661 commented Oct 30, 2024

yifanzhang-pro commented Oct 30, 2024 • edited Loading

frostfox661 commented Oct 30, 2024

yifanzhang-pro commented Oct 30, 2024

yifanzhang-pro commented Oct 30, 2024 •

edited

Loading