Streamline embeddings from "non-embedding" models #8087

iamlemec · 2024-06-24T06:00:04Z

The goal here is to get the big embedding models at the top of the MTEB leaderboard working. There are two changes:

Fix an inconsistency in output counting for u_batches. Now batch.logits is fully ignored for pooled embeddings.
Add an attention_type to llama_contex_params that allows for causal, non-causal, or unspecified (model default).

With this PR, we can get accurate results (matching HF) from at least the number 2 spot gte-Qwen2-7B-instruct. For instance, with the command:

./llama-embedding -m gte-qwen2-7b-instruct-f16.gguf -p "hello world" -ngl 99 --pooling last --at
tention non-causal -c 512

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggerganov · 2024-06-24T06:23:55Z

llama.cpp

@@ -12618,14 +12618,15 @@ static int llama_decode_internal(
    std::vector<llama_seq_id *>            seq_id_arr;
    std::vector<std::vector<llama_seq_id>> seq_id;

+    // this indicates we are doing pooled embedding, so we ignore batch.logits and output all tokens
+    bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;


Naming this embd_pooled seems more inline with the usage of the embd identifier in llama.cpp:

Suggested change

bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;

const bool embd_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;

yup, agreed.

common/common.cpp

compilade

Still looks good to me.

@iamlemec, you'll need to merge master into to this first to resolve the conflicts; some files have moved.

iamlemec · 2024-06-28T10:41:32Z

@compilade cool! just rebased to master

…8087)

iamlemec force-pushed the attention-type branch from 7da1e61 to 940b1e8 Compare June 24, 2024 06:04

iamlemec requested a review from compilade June 24, 2024 06:04

ggerganov reviewed Jun 24, 2024

View reviewed changes

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 24, 2024

compilade approved these changes Jun 24, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

ggerganov approved these changes Jun 25, 2024

View reviewed changes

iamlemec mentioned this pull request Jun 27, 2024

Bug: llama-server crashes when started with --embeddings #8076

Closed

compilade approved these changes Jun 27, 2024

View reviewed changes

fix microbatch output counting; add attention_type context parameter

9fa007c

iamlemec force-pushed the attention-type branch from a7f9325 to 9fa007c Compare June 28, 2024 09:53

ggerganov merged commit d12f781 into ggerganov:master Jul 5, 2024
53 checks passed

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 5, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

b585c23

…8087)

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 6, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

4f41bcd

…8087)

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 8, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

2e0a8dc

…8087)

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

4e3cbf7

…8087)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

cebe433

…8087)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

llama : streamline embeddings from "non-embedding" models (ggerganov#…

d49328a

…8087)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline embeddings from "non-embedding" models #8087

Streamline embeddings from "non-embedding" models #8087

iamlemec commented Jun 24, 2024

ggerganov Jun 24, 2024

iamlemec Jun 24, 2024

compilade left a comment

iamlemec commented Jun 28, 2024

	bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;
	const bool embd_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;

Streamline embeddings from "non-embedding" models #8087

Streamline embeddings from "non-embedding" models #8087

Conversation

iamlemec commented Jun 24, 2024

ggerganov Jun 24, 2024

Choose a reason for hiding this comment

iamlemec Jun 24, 2024

Choose a reason for hiding this comment

compilade left a comment

Choose a reason for hiding this comment

iamlemec commented Jun 28, 2024