Add support for HuggingFace 🤗 inference API #65

brainstorm · 2024-10-02T11:12:21Z

Checklist

Closing issues: #issue
Mark this if you consider it ready to merge
I've added tests (optional)
I wrote some documentation

Description

I have been playing with pseudo-C (pdc) decompilation for a STM8 codebase on the HuggingFace web-based Chat, for free:

Unfortunately, when hitting the API endpoint with the same model (or different ones) I'm hitting this:

[0x0000807f]> s 0x0000833c
[0x0000833c]> decai -d
{"error":"Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query."}

On the latter I'm using the same Bearer token I'm using for the chat, so I'm not entirely sure what's going on? Free tier limits are a bit vague anyway, according to this forum thread: https://discuss.huggingface.co/t/api-limits-on-free-inference-api/57711/5

Also getting strange (capacity?) errors for smaller models:

[0x0000833c]> decai -d
{"error":"Model meta-llama/Llama-3.2-1B-Instruct is currently loading","estimated_time":98.86515045166016}

In any case, I hope this addition helps folks that do pay for this service?

brainstorm · 2024-10-02T11:14:56Z

Or perhaps there's a way to somehow hit the HuggingFace Chat API (https://huggingface.co/docs/text-generation-inference ??) instead of the arguably more official Serverless Inference API? 🤔

trufae · 2024-10-02T12:53:39Z

looks good to me! the huggingface apiname is a bit long. so i would suggest to use "hf" instead as an alias,. and yeah i guess its possible to hit this endpoint without any apikey.. but not sure if we want to play dirty with them :D

brainstorm · 2024-10-11T11:54:42Z

Hi @trufae ... you were right, even with the Pro key, the model performance is absolute rubbish compared to Claude :___(

I've systematically tested the different models supported in the PRO version and they are light years away from Claude's output :_/

What is bizarre though is that using the Hugging Chat, the output looks very reasonable, even without PRO 🤷🏻 ... maybe we are formatting the input completely wrong through the API endpoint and the model gets confused?

I might explore the differences between Hugging Chat vs Serverless Inference API in more detail if we want this to work. I've made the following changes while doing some tests, I thought you might be interesting in reviewing it:

diff --git a/decai/decai.r2.js b/decai/decai.r2.js
index 381aba0..b953a23 100644
--- a/decai/decai.r2.js
+++ b/decai/decai.r2.js
@@ -38,11 +38,12 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
     let decaiApi = "r2"; // uses /cmd endpoint
     let decaiCommands = "pdc";
     let decaiLanguage = "C";
-    let decaiDebug = false;
+    let decaiDebug = true;
     let decaiContextFile = "";
     let lastOutput = "";
     let decaiCache = false;
     let decprompt = "Only show the code with no explanation or introductions. Simplify the code: - take function arguments from comment - remove dead assignments - refactor goto with for/if/while - use better names for variables - simplify as much as possible";
+//    let decprompt = "The following will be pseudo-C code. Your task is to simplify the code, take function arguments from comments, remove dead assignments, refactor goto with for/if/while, use better names for variables and simplify as much as possible. The output should be properly indented for an 80 column terminal";
     // decprompt += ", comments in function calls may replace arguments and remove unnecessary early variable assignments that happen"
 
     function decaiEval(arg) {
@@ -177,24 +178,34 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
     }
     function r2aiHuggingFace(msg, hideprompt) {
         const hfKey = r2.cmd("'cat ~/.r2ai.huggingface-key").trim();
-        const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct";
-        //const hfModel = "meta-llama/Llama-3.1-8B-Instruct";
-        //const hfModel = "meta-llama/Llama-3.2-1B-Instruct";
-        //const hfModel = "Qwen/Qwen2.5-72B-Instruct";
+
+        // Supported models on the PRO subscription: https://github.com/huggingface/hub-docs/blob/main/docs/api-inference/supported-models.md#what-do-i-get-with-a-pro-subscription
+        // ... or perhaps those are supported now?: https://huggingface.co/blog/inference-pro#supported-models ... confusing (outdated/contradicting) docs
+        //const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct";  // Never loads the model, it's always "cold"
+        //const hfModel = "meta-llama/Llama-3.1-8B-Instruct";        // Hallucinates with things like: "BlueFin Bluetooth 5.0 Low Energy Chip from Nordic Semiconductor"
+        //const hfModel = "meta-llama/Llama-3.2-1B-Instruct";        // Not right
+        //const hfModel = "Qwen/Qwen2.5-72B-Instruct";               // Stops halfway a seemingly correct-ish output?
+        const hfModel = "codellama/CodeLlama-13b-hf";
+        //const hfModel = "codellama/CodeLlama-34b-Instruct-hf";     // Absolute rubbish
+        //const hfModel = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"; // Lazy, doesn't even try to produce code, just describes it vaguely in prose
+
         if (hfKey === '') {
             return "Cannot read ~/.r2ai.huggingface-key";
         }
-        const query = hideprompt? msg: decprompt + ", Explain this pseudocode in " + decaiLanguage + "\n" + msg;
+
+        const query = hideprompt
+            ? msg
+            : `${decprompt}, Explain this pseudocode in ${decaiLanguage}\n${msg}`;
+
         const payload = JSON.stringify({
-            inputs: query,
-            parameters: {
-                max_new_tokens: 5128
-            }
+            inputs: query
         });
-        const curlcmd = `curl -s https://api-inference.huggingface.co/models/${hfModel}
-            -H "Authorization: Bearer ${hfKey}"
-            -H "Content-Type: application/json"
+        const curlcmd = `curl -X POST -s https://api-inference.huggingface.co/models/${hfModel} \
+            -H "Authorization: Bearer ${hfKey}" \
+            -H "Content-Type: application/json" \
+            -H "x-wait-for-model: true" \
             -d '${payload}'`.replace(/\n/g, "");
+
         //if (decaiDebug) {
         //     console.log(curlcmd);
         //}
@@ -207,6 +218,7 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
 
         try {
             return JSON.parse(res).generated_text;
+            //return JSON.stringify(res, null, 2);
         } catch (e) {
             console.error(e);
             console.log(res);

dnakov · 2024-10-11T12:53:41Z

I started running a bunch of tests of auto mode on like ~100 crackmes. Don't have full benchmarks yet, but anything other than sonnet-3.5 hardly ever finds any solutions. gpt-4o gets a few. Gemini is a giant hit or miss. None of the open-source ones even get anywhere close.
I'm almost thinking we should have a tiered list of models that we know are possible to work for certain tasks so people don't bother trying some shitty model and then giving up.

Add support for HuggingFace 🤗 inference API

6c7c837

trufae merged commit d70258b into radareorg:master Oct 2, 2024
1 check passed

brainstorm mentioned this pull request Oct 16, 2024

[doc] Include inconclusive tests and optimizations for Huggingface #66

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for HuggingFace 🤗 inference API #65

Add support for HuggingFace 🤗 inference API #65

brainstorm commented Oct 2, 2024 •

edited

Loading

brainstorm commented Oct 2, 2024 •

edited

Loading

trufae commented Oct 2, 2024

brainstorm commented Oct 11, 2024 •

edited

Loading

dnakov commented Oct 11, 2024

Add support for HuggingFace 🤗 inference API #65

Add support for HuggingFace 🤗 inference API #65

Conversation

brainstorm commented Oct 2, 2024 • edited Loading

brainstorm commented Oct 2, 2024 • edited Loading

trufae commented Oct 2, 2024

brainstorm commented Oct 11, 2024 • edited Loading

dnakov commented Oct 11, 2024

brainstorm commented Oct 2, 2024 •

edited

Loading

brainstorm commented Oct 2, 2024 •

edited

Loading

brainstorm commented Oct 11, 2024 •

edited

Loading