Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] About CPU performance #1666

Open
mingfeima opened this issue Sep 3, 2024 · 0 comments
Open

[Question] About CPU performance #1666

mingfeima opened this issue Sep 3, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mingfeima
Copy link

Hi, I am an engineer from Intel and I work mostly on the performance optimization of PyTorch on intel Xeon CPUs (also I am the pytorch module maintainer for cpu performance). Just come across this amazing project and from this blog fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse the chart says DeepSparse accelerates the sparse-quantized Llama models to 6-8x faster over the dense FP32 baseline.

image

The 6-8x speedup of sparse model against dense model is a fascinating result. My purpose is to check if there is a chance to further improve the performance with our previous effort on LLM optimizations.

I run according the script from https://github.com/neuralmagic/deepsparse?tab=readme-ov-file#try-it-now, however from the hardware profiler I can tell the hardware efficiency is still not very high (only ~12 cores in use on average from a 40-core machine, leading to significant sync overhead and very high CPI (cycles per instructions)). Maybe I can do something to improve this, but I am not very familiar with this codebase, and I need some guidance here:

  • how can I reproduce the above results?
  • how the model is deployed ? with onnx-runtime?

Additionally, do you continue this sparse fine tuning job on other models, for example Llama3 ? Also how about int4 ?

@mingfeima mingfeima added the enhancement New feature or request label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant