-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for HuggingFace 🤗 inference API #65
Conversation
Or perhaps there's a way to somehow hit the HuggingFace Chat API (https://huggingface.co/docs/text-generation-inference ??) instead of the arguably more official Serverless Inference API? 🤔 |
looks good to me! the huggingface apiname is a bit long. so i would suggest to use "hf" instead as an alias,. and yeah i guess its possible to hit this endpoint without any apikey.. but not sure if we want to play dirty with them :D |
Hi @trufae ... you were right, even with the Pro key, the model performance is absolute rubbish compared to Claude :___( I've systematically tested the different models supported in the PRO version and they are light years away from Claude's output :_/ What is bizarre though is that using the Hugging Chat, the output looks very reasonable, even without PRO 🤷🏻 ... maybe we are formatting the input completely wrong through the API endpoint and the model gets confused? I might explore the differences between Hugging Chat vs Serverless Inference API in more detail if we want this to work. I've made the following changes while doing some tests, I thought you might be interesting in reviewing it: diff --git a/decai/decai.r2.js b/decai/decai.r2.js
index 381aba0..b953a23 100644
--- a/decai/decai.r2.js
+++ b/decai/decai.r2.js
@@ -38,11 +38,12 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
let decaiApi = "r2"; // uses /cmd endpoint
let decaiCommands = "pdc";
let decaiLanguage = "C";
- let decaiDebug = false;
+ let decaiDebug = true;
let decaiContextFile = "";
let lastOutput = "";
let decaiCache = false;
let decprompt = "Only show the code with no explanation or introductions. Simplify the code: - take function arguments from comment - remove dead assignments - refactor goto with for/if/while - use better names for variables - simplify as much as possible";
+// let decprompt = "The following will be pseudo-C code. Your task is to simplify the code, take function arguments from comments, remove dead assignments, refactor goto with for/if/while, use better names for variables and simplify as much as possible. The output should be properly indented for an 80 column terminal";
// decprompt += ", comments in function calls may replace arguments and remove unnecessary early variable assignments that happen"
function decaiEval(arg) {
@@ -177,24 +178,34 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
}
function r2aiHuggingFace(msg, hideprompt) {
const hfKey = r2.cmd("'cat ~/.r2ai.huggingface-key").trim();
- const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct";
- //const hfModel = "meta-llama/Llama-3.1-8B-Instruct";
- //const hfModel = "meta-llama/Llama-3.2-1B-Instruct";
- //const hfModel = "Qwen/Qwen2.5-72B-Instruct";
+
+ // Supported models on the PRO subscription: https://github.com/huggingface/hub-docs/blob/main/docs/api-inference/supported-models.md#what-do-i-get-with-a-pro-subscription
+ // ... or perhaps those are supported now?: https://huggingface.co/blog/inference-pro#supported-models ... confusing (outdated/contradicting) docs
+ //const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct"; // Never loads the model, it's always "cold"
+ //const hfModel = "meta-llama/Llama-3.1-8B-Instruct"; // Hallucinates with things like: "BlueFin Bluetooth 5.0 Low Energy Chip from Nordic Semiconductor"
+ //const hfModel = "meta-llama/Llama-3.2-1B-Instruct"; // Not right
+ //const hfModel = "Qwen/Qwen2.5-72B-Instruct"; // Stops halfway a seemingly correct-ish output?
+ const hfModel = "codellama/CodeLlama-13b-hf";
+ //const hfModel = "codellama/CodeLlama-34b-Instruct-hf"; // Absolute rubbish
+ //const hfModel = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"; // Lazy, doesn't even try to produce code, just describes it vaguely in prose
+
if (hfKey === '') {
return "Cannot read ~/.r2ai.huggingface-key";
}
- const query = hideprompt? msg: decprompt + ", Explain this pseudocode in " + decaiLanguage + "\n" + msg;
+
+ const query = hideprompt
+ ? msg
+ : `${decprompt}, Explain this pseudocode in ${decaiLanguage}\n${msg}`;
+
const payload = JSON.stringify({
- inputs: query,
- parameters: {
- max_new_tokens: 5128
- }
+ inputs: query
});
- const curlcmd = `curl -s https://api-inference.huggingface.co/models/${hfModel}
- -H "Authorization: Bearer ${hfKey}"
- -H "Content-Type: application/json"
+ const curlcmd = `curl -X POST -s https://api-inference.huggingface.co/models/${hfModel} \
+ -H "Authorization: Bearer ${hfKey}" \
+ -H "Content-Type: application/json" \
+ -H "x-wait-for-model: true" \
-d '${payload}'`.replace(/\n/g, "");
+
//if (decaiDebug) {
// console.log(curlcmd);
//}
@@ -207,6 +218,7 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
try {
return JSON.parse(res).generated_text;
+ //return JSON.stringify(res, null, 2);
} catch (e) {
console.error(e);
console.log(res);
|
I started running a bunch of tests of auto mode on like ~100 crackmes. Don't have full benchmarks yet, but anything other than sonnet-3.5 hardly ever finds any solutions. gpt-4o gets a few. Gemini is a giant hit or miss. None of the open-source ones even get anywhere close. |
Checklist
Description
I have been playing with pseudo-C (pdc) decompilation for a STM8 codebase on the HuggingFace web-based Chat, for free:
Unfortunately, when hitting the API endpoint with the same model (or different ones) I'm hitting this:
On the latter I'm using the same Bearer token I'm using for the chat, so I'm not entirely sure what's going on? Free tier limits are a bit vague anyway, according to this forum thread: https://discuss.huggingface.co/t/api-limits-on-free-inference-api/57711/5
Also getting strange (capacity?) errors for smaller models:
In any case, I hope this addition helps folks that do pay for this service?