Model Stops Generating Prematurely with llamafactory-cli chat #6256

Sherlock1956 · 2024-12-05T15:09:34Z

I recently fine-tuned a LLaMA model and successfully exported it as a new model. However, when I attempt to chat with the model using the llamafactory-cli chat inference.yaml command, the generated responses consistently stop prematurely, even though I am certain there should be more information in the output.

I suspect this behavior is caused by a small max_length setting. I tried adding a max_length parameter to the YAML configuration file, but it didn’t resolve the issue. After searching through the documentation, I couldn’t find where or how to properly set this parameter.

Here is my YAML configuration file:

### examples/inference/llama3_lora_sft.yaml
model_name_or_path: /data/llama-mesh
adapter_name_or_path: /data/llama-Factory/test
template: default
finetuning_type: lora
infer_backend: huggingface #choices： [huggingface, vllm]
max_length: 8000

Could you please help me identify the correct way to configure the max_length parameter or any other settings that might be causing this issue?

Thank you!

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-12-05T15:15:12Z

I think the reason of short response is caused by the overfitting. Try cleaning your dataset and tuning the hyper-parameters.

Sherlock1956 · 2024-12-05T15:58:40Z

Thank you for your quick response! However, I don’t think the issue is caused by overfitting. When I use the following script to run the model, it consistently generates a complete answer:

from transformers import pipeline
messages = [
    {"role": "user", "content": "Create a 3D OBJ file and MANO parameters for the hand interacting with this object using the following description: A hand is holding a blue cup."},
]
from transformers import pipeline
pipe = pipeline("text-generation", model="/data/llama-mesh_ft", device_map="auto")
print(pipe(messages,max_length=8000))

As shown in the first figure, the generated message includes both the “v” (vertices) and “f” (faces) parts, which are expected in an OBJ file.

However, when I use llamafactory-cli chat to generate the OBJ file, the output only contains a few “v” parts, with no “f” parts, as seen in the second figure:

Could it be that the llamafactory-cli chat does not have a max_length parameter setting? I’m still new to LlamaFactory, so I might have missed something.

hiyouga · 2024-12-05T16:09:02Z

Check the template settings and read
#4614

github-actions bot added the pending This problem is yet to be addressed label Dec 5, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 5, 2024

hiyouga closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Sherlock1956 commented Dec 5, 2024

hiyouga commented Dec 5, 2024

Sherlock1956 commented Dec 5, 2024

hiyouga commented Dec 5, 2024

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Model Stops Generating Prematurely with llamafactory-cli chat #6256

Comments

Sherlock1956 commented Dec 5, 2024

hiyouga commented Dec 5, 2024

Sherlock1956 commented Dec 5, 2024

hiyouga commented Dec 5, 2024