Fine-tuning using KTO #5571

Vapcata · 2024-09-27T10:33:54Z

Vapcata
Sep 27, 2024

So I am fine-tuning a conversational model and I have a domain-specific dataset. The dataset is structured dialog messages and each conversation is labeled using kto_tags [true/false]. I had until now only positive examples, but now I want to introduce negative samples.

I have couple of different functions that the LLM is able to use and one of those function_calls is search_address which has some required arguments which the model has already learned from the structured dialog dataset. For some addresses there is another argument open_at that is just like google for instance where you can search for locations that are currently open now for example.

Now for some locations it doesn't make sense to add it like streets for example. What I did is I tried to introduce negative samples to train the model to not put it of streets or generally speaking public locations. I can not seem to get the model to learn when to put it and when not to. I guess that if I get the positive example and then right after that I put the opposite or the negative sample the model will learn the difference between right and wrong by counter examples, but I am not sure if this works and also by doing this I use twice the amount of samples, because for each example I add a negative sample

So my question is do you think this will work or is there a better way to teach the model when to use this argument?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning using KTO #5571

{{title}}

Replies: 0 comments

Select a reply

Fine-tuning using KTO #5571

Vapcata Sep 27, 2024

Replies: 0 comments

Vapcata
Sep 27, 2024