Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HuggingFace 🤗 inference API #65

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

brainstorm
Copy link
Contributor

@brainstorm brainstorm commented Oct 2, 2024

Checklist

  • Closing issues: #issue
  • Mark this if you consider it ready to merge
  • I've added tests (optional)
  • I wrote some documentation

Description

I have been playing with pseudo-C (pdc) decompilation for a STM8 codebase on the HuggingFace web-based Chat, for free:

Screenshot 2024-10-02 at 9 07 17 PM

Unfortunately, when hitting the API endpoint with the same model (or different ones) I'm hitting this:

[0x0000807f]> s 0x0000833c
[0x0000833c]> decai -d
{"error":"Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query."}

On the latter I'm using the same Bearer token I'm using for the chat, so I'm not entirely sure what's going on? Free tier limits are a bit vague anyway, according to this forum thread: https://discuss.huggingface.co/t/api-limits-on-free-inference-api/57711/5

Also getting strange (capacity?) errors for smaller models:

[0x0000833c]> decai -d
{"error":"Model meta-llama/Llama-3.2-1B-Instruct is currently loading","estimated_time":98.86515045166016}

In any case, I hope this addition helps folks that do pay for this service?

@brainstorm
Copy link
Contributor Author

brainstorm commented Oct 2, 2024

Or perhaps there's a way to somehow hit the HuggingFace Chat API (https://huggingface.co/docs/text-generation-inference ??) instead of the arguably more official Serverless Inference API? 🤔

@trufae trufae merged commit d70258b into radareorg:master Oct 2, 2024
1 check passed
@trufae
Copy link
Contributor

trufae commented Oct 2, 2024

looks good to me! the huggingface apiname is a bit long. so i would suggest to use "hf" instead as an alias,. and yeah i guess its possible to hit this endpoint without any apikey.. but not sure if we want to play dirty with them :D

@brainstorm
Copy link
Contributor Author

brainstorm commented Oct 11, 2024

Hi @trufae ... you were right, even with the Pro key, the model performance is absolute rubbish compared to Claude :___(

I've systematically tested the different models supported in the PRO version and they are light years away from Claude's output :_/

What is bizarre though is that using the Hugging Chat, the output looks very reasonable, even without PRO 🤷🏻 ... maybe we are formatting the input completely wrong through the API endpoint and the model gets confused?

I might explore the differences between Hugging Chat vs Serverless Inference API in more detail if we want this to work. I've made the following changes while doing some tests, I thought you might be interesting in reviewing it:

diff --git a/decai/decai.r2.js b/decai/decai.r2.js
index 381aba0..b953a23 100644
--- a/decai/decai.r2.js
+++ b/decai/decai.r2.js
@@ -38,11 +38,12 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
     let decaiApi = "r2"; // uses /cmd endpoint
     let decaiCommands = "pdc";
     let decaiLanguage = "C";
-    let decaiDebug = false;
+    let decaiDebug = true;
     let decaiContextFile = "";
     let lastOutput = "";
     let decaiCache = false;
     let decprompt = "Only show the code with no explanation or introductions. Simplify the code: - take function arguments from comment - remove dead assignments - refactor goto with for/if/while - use better names for variables - simplify as much as possible";
+//    let decprompt = "The following will be pseudo-C code. Your task is to simplify the code, take function arguments from comments, remove dead assignments, refactor goto with for/if/while, use better names for variables and simplify as much as possible. The output should be properly indented for an 80 column terminal";
     // decprompt += ", comments in function calls may replace arguments and remove unnecessary early variable assignments that happen"
 
     function decaiEval(arg) {
@@ -177,24 +178,34 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
     }
     function r2aiHuggingFace(msg, hideprompt) {
         const hfKey = r2.cmd("'cat ~/.r2ai.huggingface-key").trim();
-        const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct";
-        //const hfModel = "meta-llama/Llama-3.1-8B-Instruct";
-        //const hfModel = "meta-llama/Llama-3.2-1B-Instruct";
-        //const hfModel = "Qwen/Qwen2.5-72B-Instruct";
+
+        // Supported models on the PRO subscription: https://github.com/huggingface/hub-docs/blob/main/docs/api-inference/supported-models.md#what-do-i-get-with-a-pro-subscription
+        // ... or perhaps those are supported now?: https://huggingface.co/blog/inference-pro#supported-models ... confusing (outdated/contradicting) docs
+        //const hfModel = "deepseek-ai/DeepSeek-Coder-V2-Instruct";  // Never loads the model, it's always "cold"
+        //const hfModel = "meta-llama/Llama-3.1-8B-Instruct";        // Hallucinates with things like: "BlueFin Bluetooth 5.0 Low Energy Chip from Nordic Semiconductor"
+        //const hfModel = "meta-llama/Llama-3.2-1B-Instruct";        // Not right
+        //const hfModel = "Qwen/Qwen2.5-72B-Instruct";               // Stops halfway a seemingly correct-ish output?
+        const hfModel = "codellama/CodeLlama-13b-hf";
+        //const hfModel = "codellama/CodeLlama-34b-Instruct-hf";     // Absolute rubbish
+        //const hfModel = "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"; // Lazy, doesn't even try to produce code, just describes it vaguely in prose
+
         if (hfKey === '') {
             return "Cannot read ~/.r2ai.huggingface-key";
         }
-        const query = hideprompt? msg: decprompt + ", Explain this pseudocode in " + decaiLanguage + "\n" + msg;
+
+        const query = hideprompt
+            ? msg
+            : `${decprompt}, Explain this pseudocode in ${decaiLanguage}\n${msg}`;
+
         const payload = JSON.stringify({
-            inputs: query,
-            parameters: {
-                max_new_tokens: 5128
-            }
+            inputs: query
         });
-        const curlcmd = `curl -s https://api-inference.huggingface.co/models/${hfModel}
-            -H "Authorization: Bearer ${hfKey}"
-            -H "Content-Type: application/json"
+        const curlcmd = `curl -X POST -s https://api-inference.huggingface.co/models/${hfModel} \
+            -H "Authorization: Bearer ${hfKey}" \
+            -H "Content-Type: application/json" \
+            -H "x-wait-for-model: true" \
             -d '${payload}'`.replace(/\n/g, "");
+
         //if (decaiDebug) {
         //     console.log(curlcmd);
         //}
@@ -207,6 +218,7 @@ You can also make r2ai -w talk to an 'r2ai-server' using this line:
 
         try {
             return JSON.parse(res).generated_text;
+            //return JSON.stringify(res, null, 2);
         } catch (e) {
             console.error(e);
             console.log(res);

@dnakov
Copy link
Collaborator

dnakov commented Oct 11, 2024

I started running a bunch of tests of auto mode on like ~100 crackmes. Don't have full benchmarks yet, but anything other than sonnet-3.5 hardly ever finds any solutions. gpt-4o gets a few. Gemini is a giant hit or miss. None of the open-source ones even get anywhere close.
I'm almost thinking we should have a tiered list of models that we know are possible to work for certain tasks so people don't bother trying some shitty model and then giving up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants