Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline embeddings from "non-embedding" models #8087

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

iamlemec
Copy link
Collaborator

The goal here is to get the big embedding models at the top of the MTEB leaderboard working. There are two changes:

  • Fix an inconsistency in output counting for u_batches. Now batch.logits is fully ignored for pooled embeddings.
  • Add an attention_type to llama_contex_params that allows for causal, non-causal, or unspecified (model default).

With this PR, we can get accurate results (matching HF) from at least the number 2 spot gte-Qwen2-7B-instruct. For instance, with the command:

./llama-embedding -m gte-qwen2-7b-instruct-f16.gguf -p "hello world" -ngl 99 --pooling last --at
tention non-causal -c 512

llama.cpp Outdated
@@ -12618,14 +12618,15 @@ static int llama_decode_internal(
std::vector<llama_seq_id *> seq_id_arr;
std::vector<std::vector<llama_seq_id>> seq_id;

// this indicates we are doing pooled embedding, so we ignore batch.logits and output all tokens
bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this embd_pooled seems more inline with the usage of the embd identifier in llama.cpp:

Suggested change
bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;
const bool embd_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, agreed.

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 24, 2024
common/common.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me.

@iamlemec, you'll need to merge master into to this first to resolve the conflicts; some files have moved.

@iamlemec
Copy link
Collaborator Author

@compilade cool! just rebased to master

@ggerganov ggerganov merged commit d12f781 into ggerganov:master Jul 5, 2024
53 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 5, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 6, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 8, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants