Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to ensure my Intel Arc A770 GPU is used? #1083

Open
BDDwaCT opened this issue Oct 25, 2024 · 4 comments
Open

Question: How to ensure my Intel Arc A770 GPU is used? #1083

BDDwaCT opened this issue Oct 25, 2024 · 4 comments
Labels
question Further information is requested

Comments

@BDDwaCT
Copy link

BDDwaCT commented Oct 25, 2024

What is your question?

After reading the documentation, I am still not clear how to get my Intel Arc A770 GPU selected to work over my CPU. Any suggestions would be appreciated. Thanks in advance.

@BDDwaCT BDDwaCT added the question Further information is requested label Oct 25, 2024
@eugeis
Copy link
Collaborator

eugeis commented Oct 26, 2024

Hi, fabric does not host any LLM on his own, but uses different AI vendors.
E.g. you can use Ollama to host open source models and check Ollama docu how to ensure, that the GPU is used. Actually if you start Ollama you will see in output if the GPU is recognized and used or not.

@BDDwaCT
Copy link
Author

BDDwaCT commented Oct 26, 2024

So I agree and understand that Fabric itself is not the LLM and, however I apologize for my naivete up front but I guess my problem is tying ollama to Fabric and ensuring they work well together? I have gotten ollama to work with my Intel ARC A770 GPU recently on a side project using miniforge etc. But it's not consistent. I know that is not the concern of this project, but I love the project and what possibilities it has for everyone who can successfully run it, preferably on a GPU... HA! So.. LSS if you (or anyone) has any great suggestions or articles to try/read on my issue, I would greatly appreciate it.

@jaredmontoya
Copy link
Contributor

use ollama ps to debug what device the model runs on. Models can be offloaded to RAM if they don't fit into VRAM.

@SamAcctX
Copy link

It should also be noted that Ollama also has a bad habit of unloading models if they haven't been actively used in the past 5 mins. There's an env var you can set that will have Ollama keep the model loaded longer.

To have a longer delay before a model is unloaded, set the OLLAMA_KEEP_ALIVE env var on your Ollama server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants