You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently fine-tuned a LLaMA model and successfully exported it as a new model. However, when I attempt to chat with the model using the llamafactory-cli chat inference.yaml command, the generated responses consistently stop prematurely, even though I am certain there should be more information in the output.
I suspect this behavior is caused by a small max_length setting. I tried adding a max_length parameter to the YAML configuration file, but it didn’t resolve the issue. After searching through the documentation, I couldn’t find where or how to properly set this parameter.
Thank you for your quick response! However, I don’t think the issue is caused by overfitting. When I use the following script to run the model, it consistently generates a complete answer:
from transformers import pipeline
messages = [
{"role": "user", "content": "Create a 3D OBJ file and MANO parameters for the hand interacting with this object using the following description: A hand is holding a blue cup."},
]
from transformers import pipeline
pipe = pipeline("text-generation", model="/data/llama-mesh_ft", device_map="auto")
print(pipe(messages,max_length=8000))
As shown in the first figure, the generated message includes both the “v” (vertices) and “f” (faces) parts, which are expected in an OBJ file.
However, when I use llamafactory-cli chat to generate the OBJ file, the output only contains a few “v” parts, with no “f” parts, as seen in the second figure:
Could it be that the llamafactory-cli chat does not have a max_length parameter setting? I’m still new to LlamaFactory, so I might have missed something.
I recently fine-tuned a LLaMA model and successfully exported it as a new model. However, when I attempt to chat with the model using the llamafactory-cli chat inference.yaml command, the generated responses consistently stop prematurely, even though I am certain there should be more information in the output.
I suspect this behavior is caused by a small max_length setting. I tried adding a max_length parameter to the YAML configuration file, but it didn’t resolve the issue. After searching through the documentation, I couldn’t find where or how to properly set this parameter.
Here is my YAML configuration file:
Could you please help me identify the correct way to configure the max_length parameter or any other settings that might be causing this issue?
Thank you!
The text was updated successfully, but these errors were encountered: