Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Closed
Sherlock1956 opened this issue Dec 5, 2024 · 3 comments
Closed

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Sherlock1956 opened this issue Dec 5, 2024 · 3 comments
Labels
solved This problem has been already solved

Comments

@Sherlock1956
Copy link

I recently fine-tuned a LLaMA model and successfully exported it as a new model. However, when I attempt to chat with the model using the llamafactory-cli chat inference.yaml command, the generated responses consistently stop prematurely, even though I am certain there should be more information in the output.

I suspect this behavior is caused by a small max_length setting. I tried adding a max_length parameter to the YAML configuration file, but it didn’t resolve the issue. After searching through the documentation, I couldn’t find where or how to properly set this parameter.

Here is my YAML configuration file:

### examples/inference/llama3_lora_sft.yaml
model_name_or_path: /data/llama-mesh
adapter_name_or_path: /data/llama-Factory/test
template: default
finetuning_type: lora
infer_backend: huggingface #choices: [huggingface, vllm]
max_length: 8000

Could you please help me identify the correct way to configure the max_length parameter or any other settings that might be causing this issue?

Thank you!

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 5, 2024
@hiyouga
Copy link
Owner

hiyouga commented Dec 5, 2024

I think the reason of short response is caused by the overfitting. Try cleaning your dataset and tuning the hyper-parameters.

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 5, 2024
@hiyouga hiyouga closed this as completed Dec 5, 2024
@Sherlock1956
Copy link
Author

Thank you for your quick response! However, I don’t think the issue is caused by overfitting. When I use the following script to run the model, it consistently generates a complete answer:

from transformers import pipeline
messages = [
    {"role": "user", "content": "Create a 3D OBJ file and MANO parameters for the hand interacting with this object using the following description: A hand is holding a blue cup."},
]
from transformers import pipeline
pipe = pipeline("text-generation", model="/data/llama-mesh_ft", device_map="auto")
print(pipe(messages,max_length=8000))

As shown in the first figure, the generated message includes both the “v” (vertices) and “f” (faces) parts, which are expected in an OBJ file.
image
However, when I use llamafactory-cli chat to generate the OBJ file, the output only contains a few “v” parts, with no “f” parts, as seen in the second figure:
image
Could it be that the llamafactory-cli chat does not have a max_length parameter setting? I’m still new to LlamaFactory, so I might have missed something.

@hiyouga
Copy link
Owner

hiyouga commented Dec 5, 2024

Check the template settings and read
#4614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants