Replies: 1 comment
-
As GPT said :D. I can log the result prompt with this line of code |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm sorry, but can not find information for regular human :)
I did successfully normal fine-tuning with my dataset, but interesting for same with realtime reward model.
I get noticed even Qwen 3b can provide good score for prompt-response pair.
I'm interested how to use it with Llama-Facotory PPO fine-tuning. How to prepare things to start.
I've found the string where is score 'calculating' (in this discussion #1487), but is not clear how value to score looks like.
I mean, the reward model should give a float value. But Qwen needs additional text of prompt to understand it and produce.
Beta Was this translation helpful? Give feedback.
All reactions